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By Sacha Vignieri 


umans are primates. If we weren't 
able to do things like write poetry 
and drive cars, we would likely be 
classified as another species of great 
ape, along with our closest cousins— 
chimpanzees, bonobos, gorillas, and 
orangutans. Thus, understanding the 
genomes, evolutionary history, social- 
ity, and, some might argue, even ecol- 
ogy of modern primates greatly informs our 
understanding of ourselves. 

Species in the order Primates include not 
only humans and our closest relatives but also 
species that occupy a wide array of habitats, 
from savanna to tropical forest and 


A male olive baboon 
even to mountainous areas, where (panio anubis) peers curiously Standing of how to conserve the other 


snow is a regular occurrence. In into the camera. members of our own order. 


this special issue, the sequencing of more than 
230 primate genomes, globally, reveals pat- 
terns of speciation across the entire order as 
well as the contributions of hybridization to 
diversification and how adaptations to cold 
have contributed to the evolution of social 
structure. In addition, the genomes are used 
to characterize rare mutations associated with 
disease risk in humans. 

Primates not only have a past that helps us un- 
derstand ourselves but also an uncertain future- 
more than 60% are threatened with extinction. 
The knowledge gained through this first, large 
effort to characterize their genomes will, hope- 
fully, also lead to an increased under- 
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The rich diversity of morphology and behavior displayed across primate species provides an informative 
context in which to study the impact of genomic diversity on fundamental biological processes. Analysis 

of that diversity provides insight into long-standing questions in evolutionary and conservation biology 
and is urgent given severe threats these species are facing. Here, we present high-coverage whole- 
genome data from 233 primate species representing 86% of genera and all 16 families. This dataset was 
used, together with fossil calibration, to create a nuclear DNA phylogeny and to reassess evolutionary 
divergence times among primate clades. We found within-species genetic diversity across families 

and geographic regions to be associated with climate and sociality, but not with extinction risk. Furthermore, 
mutation rates differ across species, potentially influenced by effective population sizes. Lastly, we 
identified extensive recurrence of missense mutations previously thought to be human specific. This 
study will open a wide range of research avenues for future primate genomic research. 


he order Primates includes over 500 

recognized species that display an ar- 

ray of morphological, physiological, and 

behavioral adaptations (J). Spanning a 

broad range of social systems, locomo- 
tory styles, dietary specializations, and habi- 
tat preferences, these species rightly attract 
attention from scientists with equally diverse 
research interests. Because humans are mem- 
bers of the order Primates, we also find many 
important and informative biological parallels 
between ourselves and other primates. The 
analysis of nonhuman primate genomes has 
long been motivated by a desire to understand 
human evolutionary origins, human health, 
and disease. However, past comparative ge- 
nomic analyses have mainly focused on a 
relatively small number of species (2, 3), thus 
providing a limited understanding of genome 
variability in only a few key lineages, such as 
members of the great apes (4-0) or macaques 
(11-13). Furthermore, low numbers of wild- 
born individuals in these studies potentially 
result in assessments of diversity that may 
not reflect natural populations (3). To gain a 
more complete picture of how evolution has 
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shaped genomic variation across primates, 
large-scale sequencing of many species and 
individuals is necessary, especially within pre- 
viously neglected lineages such as strepsirrhines 
(lemurs, lorises, galagos, and relatives) and 
platyrrhines (monkeys of the Americas). The 
need for a more complete understanding of 
primate genetic diversity in the wild, and its 
determinants, is urgent given the current ex- 
tinction crisis driven by climate change, habi- 
tat loss, and illegal trading and hunting (J4). 
At present, 60% of the world's primate species 
are threatened with extinction, and current 
trends are likely to exacerbate the rates of bio- 
diversity loss in the near future (/4, 15). The 
analysis of whole-genome sequences allows 
estimation of genetic diversity and evaluation 
of its association with ecological traits, degrees 
of inbreeding, and phylogenetic relationships, 
all metrics relevant to primate conservation 
genomics. 


High-coverage genome sequences of 233 
primate species 


We sequenced the genomes of 703 individu- 
als from 211 primate species on the Illumina 


2 June 2023 


dividuals, the available amount of DNA 
mitted us to generate polymerase chain 
reaction-free libraries. We sequenced paired- 
end reads of 151 base pairs (bp) to an average 
production target of at least 100 gigabases 
(Gb), resulting in an average mapped cover- 
age of 32.4 per individual (15.3 to 77.6x) (16). 
We expanded our dataset by including 106 
individuals representing 29 species from previ- 
ously published studies to maximize phylo- 
genetic diversity (8, 17-24). Altogether, we 
compiled data from 809 individuals from 233 
primate species, amounting to 47% of the 521 
currently recognized species (4). Our sam- 
pling covers 86% of primate genera (69), and 
all 16 families. More than 72% of individuals in 
this study are wild-born. Furthermore, 58% of 
species in our dataset are classified as threat- 
ened with extinction by the International Union 
for Conservation of Nature (IUCN) [i.e., classi- 
fied in the categories vulnerable (VU), endan- 
gered (EN), and critically endangered (CR)], 
and 30 species are critically endangered. It is 
worth noting that among the species we sam- 
pled are some of the world’s most endan- 
gered primates, which face an extremely high 
risk of extinction in the wild. Examples include 
the Western black crested gibbon (Nomascus 
concolor), with an estimated 1500 individuals 
left in the wild and scattered across an array 
of discontinuous habitats, and the northern 
sportive lemur (Lepilemur septentrionalis), with 
roughly 40 individuals estimated to remain 
in the wild, inhabiting an area potentially as 
small as 12 km? (25, 26). 

For 100 species, we generated sequencing 
data from more than one individual, and for 
36 species from five or more individuals, 29 
of which belong to newly sequenced species. 
We thus gathered broad primate taxonomic 
coverage by compiling species from all major 
geographic regions currently inhabited by 
primates, including the Americas, mainland 
Africa, Madagascar, and Asia (Fig. 1A). The 
data presented here provide the foundation 
for several additional studies in this issue, in- 
forming important and diverse topics includ- 
ing hybrid speciation and reticulation among 
primates (27) and predicting the landscape of 
tolerated mutations in the human genome (28). 

Owing to technical challenges inherent to 
short-read assembly, we aligned our data to a 
backbone of 32 reference genomes for fur- 
ther analyses, most of which are derived from 
long-read sequencing technologies (J6). These 
references are well distributed across the pri- 
mate phylogeny and result in a median pair- 
wise distance between the focal and reference 
species of 6.6 x 10~° substitutions per site (0 
to 4.1 x 10°), which is within the range of 
previous projects using a similar approach (8). 
To ensure our estimates of genetic diversity 
over these phylogenetic distances are minimally 
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biased, we compared pairs of diversity esti- 
mates in which reads from one species were 
mapped to its own reference as well as mapped 
to another species reference. Across 19 species 
pairs that fully cover the phylogenetic distances 
between focal species and reference in our 
data, we find heterozygosity estimates to be 
highly correlated (Pearson’s r = 0.97, p = 6.8 x 
10”). Overall, we find a median value of 2.4. Gb 
per individual to be callable across all refer- 
ences, thus enabling genome-wide comparisons. 


Genetic diversity across primates 


Heterozygosity in primates spans over an order 
of magnitude, with values ranging from 0.41 x 
10°? heterozygotes per base pair (het x bp”) to 
7.14: x 10°? het x bp’ (Fig. 1C). We observe the 
lowest levels of diversity in the golden snub-nosed 
monkey (Rhinopithecus roxellana) at about one 
heterozygous position every 2400 bp. Only 15 spe- 
cies have a lower median genetic diversity than 
humans, the primate with by far the largest cen- 
sus size. Among these are several Asian colobines, 
but also the aye-aye, the western hoolock gibbon, 
and the Guinea baboon. There are marked dif- 
ferences in genetic diversity across genera, fam- 
ilies, and geographic regions, with high-diversity 
species found among cercopithecines from 
mainland Africa and lemurs in Madagascar 
(Fig. 1B). Among cercopithecines, guenons of 
the genus Cercopithecus are almost exclusively 
responsible for high diversity with a median 
value of 4.54 x 107? het x bp”), more than double 


the primate-wide median. Some members of 
this tribe also show large historical effective 
population sizes, and there are several known 
instances of past and present interspecific 
hybridization (29-32). We further observe 
high diversity across several genera of lemurs, 
which are among the most endangered pri- 
mates, primarily owing to rapid habitat loss 
and severe population decline. Examples in- 
clude members of the true lemurs (Eulemur 
spp.), bamboo lemurs (Hapalemur spp.), and 
sifakas (Propithecus spp.). 

We investigated whether genetic diversity 
estimates are correlated with extinction risk in 
primates, a subject of previous debate (7, 33, 34). 
Despite our broad sampling, we find no global 
relationship between numerically coded IUCN 
extinction risk categories and estimated het- 
erozygosity [p > 0.05, phylogenetic generalized 
least squares (PGLS)] (Fig. 2A) (16). Because 
genetic diversity is strongly determined by 
long-term demographic history, rapid recent 
population declines such as those currently 
experienced by many primate species are un- 
likely to be detected in a cross-species com- 
parison. Instead, temporal datasets within the 
same species are better suited to quantify 
recent changes in genetic diversity (35). Never- 
theless, comparing genetic diversity for non- 
threatened [least concern (LC), near-threatened 
(NT)] and threatened (VU, EN, CR) species 
within the same family consistently uncovers 
lower diversity among species in the threat- 


ened categories for all families with more than 
one species in both categories, although not 
all comparisons reach statistical significance 
(p < 0.05, Mann-Whitney U test) (Fig. 2B). 
The only exception is Lorisidae, which showed 
no difference in genetic diversity between non- 
threatened and threatened species. 

To further assess the potential impact of 
recent population decline, we analyzed runs of 
homozygosity (ROH) across species. We focused 
on tracts with a minimum length of one mega- 
base (Mb), which in humans indicate recent 
inbreeding (8). The order-wide median frac- 
tion of the genome in RoH is 5.1%, and indi- 
vidual values vary substantially, reaching over 
50%. We find critically endangered species, such 
as the white-headed langur (Trachypithecus 
leucocephalus), the eastern gorilla (Gorilla 
beringei), and mongoose lemur (Eulemur 
mongoz), among the species with the highest 
proportion of RoHs (Fig. 2C). However, some 
species not currently classified as threatened, 
such as Azara’s owl monkey (4otus agarae) 
and the northern greater galago (Otolemur 
garnettii), also have a high fraction of the ge- 
nome in RoHs. Although the overall conser- 
vation status of these two species might not 
be worrisome, some individuals may belong to 
smaller local populations, which can exacerbate 
inbreeding. We find 13 critically endangered 
species with lower than the primate-wide av- 
erage fractions of their genomes in RoHs, among 
them the three douc langur species (Pygathrix 
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Fig. 1. Genetic diversity in primates across geographic regions and families. (A) Sampling range of 
species analyzed in this project. Each point represents the approximate species range centroid of all sampled 
species with available ranges. Points are repelled to avoid overplotting. (B) Heterozygosity stratified by 
geographic region. Solid black circles and whiskers represent median values and interquartile range. 

(C) Median species heterozygosity by family. Solid circles and whiskers represent median and interquartile 
range. Solid gray line denotes primate-wide median heterozygosity; dashed and dotted lines denote human 


heterozygosity for African and bottlenecked out-of-Africa populations, respectively. Points are colored 
according to the family a species belongs to, as denoted on the x axis of (C). 


cinerea, P. nemaeus, P. nigripes), red-tailed 
sportive lemur (Lepilemur ruficaudatus), and 
Verreaux’s sifaka (Propithecus verreauxi). We 
find no overall relationship between extinc- 
tion risk and degree of inbreeding deduced 
from the total fraction of the genome in RoHs 
(Pearson’s r = 0.03, p = 0.71). This implies that 
RoHs are not a good predictor of extinction 
risk in primates and suggests that many crit- 
ically endangered species are threatened by 
nongenetic factors, likely reflecting population 
declines that have been too fast to be detect- 
able on the genomic level. Given the potential 
importance of functional variation to conser- 
vation efforts, we sought to quantify the pro- 
portion of loss of functional variation in each 
lineage (34, 36). To this end, we quantified 
stop-gain and missense mutations and normal- 
ized them by the number of synonymous muta- 
tions to account for lineage-specific differences 
in evolutionary rates. We found inverse relation- 
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ships between the missense/synonymous ratios 
(Pearson’s r = —0.35, p = 9.3 x 10°) and, to a 
lesser extent, stop-gain/synonymous ratios and 
heterozygosity across primates, suggesting ef- 
fects of purifying selection on deleterious va- 
riation, although the latter does not reach 
statistical significance (Pearson’s 7 = —0.12, p = 
0.082). We do not find deleterious variations 
as measured by the stop-gain/synonymous ratio 
to be correlated with extinction risk (Pearson’s 
r < 0.01, p = 0.94). Nevertheless, we caution 
that the varying quality of the references and 
their annotations, together with potential 
changes in gene structure between the refer- 
ences and analyzed species, might add noise to 
the comparisons across our references. 


A time-calibrated nuclear phylogeny 
of primates 


We generated a genome-wide nuclear phylog- 
eny of ultraconserved elements (UCEs) and 
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500 bp of their flanking regions, a widely used 
marker that enables easy detection of se- 
quence orthologs across species (37). To this 
end, we identified the location of ~3500 UCE 
probes across all primate genomes and gener- 
ated individual gene trees for each locus using 
a maximum-likelihood approach (38-40). We 
used the resulting trees as input for a coales- 
cent analysis to obtain the topology of the 
species tree, which has strong support at most 
nodes and recovers all currently recognized 
primate families, tribes, and genera as mono- 
phyletic (47-44). We used a newly established 
set of 27 well-justified fossil calibration points 
to constrain the timing of key phylogenetic 
divergences among different lineages (45). We 
estimate the split between Haplorhini and 
Strepsirrhini to have happened between 63.3 
and 58.3 million years (Ma) ago, and thus the 
radiation of crown Primates is entirely within 
the Paleocene. We find the deepest divergence 
within tarsiers to be notably recent at 15.2 to 
9.5 Ma, which, together with fossil evidence, 
implies considerable extinction along the long 
branch leading to extant tarsiers (46-49). All 
interfamilial relationships within our phylog- 
eny receive strong support [posterior probabil- 
ity (PP) = 1], except for the position of Aotidae 
(owl monkeys), which is weakly supported as 
sister to Callitrichidae (marmosets and tama- 
rins) rather than Cebidae (capuchin and squir- 
rel monkeys) (PP = 0.56). We consider the 
precise relationship among these three fami- 
lies to remain uncertain. Lastly, we estimate 
the human-chimpanzee divergence between 
9.0 and 6.9 Ma, and thus slightly older than 
other recent analyses, although these overlap 
our confidence intervals (41-43). 

Taking advantage of our rich resequencing 
data, we generated a tree topology that in- 
cludes two individuals per species for all spe- 
cies with more than one sequenced individual. 
We observe paraphyletic or polyphyletic place- 
ments of these individuals in 17 species, pos- 
sibly calling several currently established species 
boundaries into question (Fig. 3). These cases 
could result from genetic structure interpreted 
as species delimitation, incomplete lineage sort- 
ing, or hybridization, and most are also ob- 
served at the mitochondrial level (6, 50-53). 
Although some instances of hybridization have 
previously been described, such as among dif- 
ferent species of langurs (54), we find most of 
the paraphyletic or polyphyletic placements 
among platyrrhines. These include 13 species, 
among them capuchins, squirrel monkeys, 
howler monkeys, uakaris, sakis, and titis, and 
point to the need for more taxonomic studies 
using genomic data in this group (55). Finally, 
we retrieve previously unknown phylogenetic 
relationships for species that were sequenced 
for the first time in this study, such as differ- 
ent species of howler monkeys (e.g., Alowatta 
puruensis, or A. juara). 
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Fig. 2. Runs of homozygosity and impact of extinction risk on diversity (A) Relationship between IUCN extinction risk categories and heterozygosity. Solid black 
circles and bars denote median and IQR. (B) Partition into threatened (T: VU, EN, CR) and nonthreatened (N: LC, NT) categories for all families with more than 
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denote threatened species (VU, EN, CR). 


Determinants of diversity and mutation rate 

We used the topology of the species tree and 
614 UCE alignments, for which we had full 
species coverage, to estimate branch lengths as 
the number of substitutions per site. We com- 
bined this with our dated phylogeny and pub- 
lished estimates of generation times to estimate 
mutation rates per generation for all primate 
species from their substitution rates (/6). Al- 
though we caution that we cannot rule out 
potential biases in these estimates, such as the 
effects of selection or uncertainties in fossil 
calibration, they agree well with published es- 
timates for overlapping species on the basis of 
trio sequencing (Spearman’s 7 = 0.85, p = 0.02; 
Fig. 4C). Our estimated mutation rates (u) per 
generation vary between 0.25 x 10-8 and 1.62 x 
10°® (Fig. 4A), showing a considerably larger 
range than previously reported (56). We ob- 
serve the lowest estimate per generation in 
Lemuridae and find highly variable estimates 
across some families such as Cebidae and 
Lorisidae, which also have variable generation 
times (8 to 17 and 4.6 to 9 years per generation, 
respectively). The highest estimates of u are in 
great apes. We find a significant and positive 
correlation between per generation and the 
generation time (Spearman’s 7 = 0.36, p = 1.89 x 
10°), which partly counteracts a generation- 
time effect on the yearly mutation rate. The lat- 
ter is therefore larger in species with a shorter 
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generation time (Fig. 4E). Together, variation in 
effective population size (N.) and generation 
time explain roughly half of the observed var- 
jation in mutation rates among extant species. 

We used our estimates of 1 and estimates of 
genetic diversity m based on median heterozy- 
gosity to get an estimate of the effective pop- 
ulation sizes N. = 1/(4 x ). We find multiple 
species belonging to different families of le- 
murs, as well as several species of guenons with- 
in the Cercopithecidae, with the largest N. 
estimates, often exceeding 2 x 10° (Fig. 4B). For 
several critically endangered lemur species, 
e.g., the northern sportive lemur (Lepilemur 
septentrionalis), the red-tailed sportive lemur 
(Lepilemur ruficaudatus), or the Alaotra reed 
lemur (Hapalemur alaotrensis), these likely 
surpass census sizes by a considerable mar- 
gin. We find multiple members of the genera 
Cercopithecus and Eulemur exhibiting high 
N, values, which may be driven by interspe- 
cific hybridization observed in these species. 
Conversely, we observe comparatively low N. 
estimates in great apes, lorises, and platyr- 
rhines (Fig. 4B) (6). 

The drift-barrier hypothesis (57, 58) predicts 
that u per generation should decrease with N,, 
because new mutations affecting fitness are 
predominantly deleterious, and the ability to 
select for lower mutation rate increases with 
the population size. We tested for a relation- 
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ship between u and JN,, while controlling for 
the relationship between » and generation 
time in a PGLS model, and observed a sig- 
nificantly lower mutation rate for species with 
higher N,. We find around 45% of the varia- 
tion in u to be explained by N,, thus lending 
apparent support to the drift-barrier hypoth- 
esis (59). However, we caution that although 
this pattern is consistent with the drift-barrier 
hypothesis, NV, is estimated by the division of x 
by u, which at least partially explains the neg- 
ative relationship. Additionally, our estimates 
of 4 assume homogeneous levels of evolu- 
tionary constraint on the UCEs and flanking 
regions used to estimate divergence time and 
substitution rate. Should there be a strong 
covariation between substitution rates in 
these regions and effective size in branches, 
underlying variation in NV, along the branches 
of the phylogeny can act as a confounder of 
apparent variation in mutation rates and thus 
further complicate a formal test of the drift- 
barrier hypothesis. 

To further disentangle what factors might 
contribute to the levels of genetic diversity and 
mutation rates, we compiled a list of 32 traits 
that can be summarized by grouping them into 
the broader categories of body mass, life history, 
activity budget, ranging patterns, climatic niche, 
social organization, sexual selection, diet compo- 
sition, social systems, mating systems, and natal 
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Fig. 3. Fossil-calibrated nuclear time tree. Concentric background circles mark 10-million-year intervals; solid gray circles in internal nodes show fossil calibration 
points (36); species marked with solid circles at tips show paraphyly or polyphyly when including additional individuals to estimate the topology. 


dispersal mode (60-62). To account for potential 
phylogenetic inertia in trait evolution, we gen- 
erated PGLS models using either genetic diver- 
sity or mutation rate as the response variable 
and individual traits as the predictors. We find 
traits within mating systems, activity budget, 
climatic niche, ranging patterns, and life his- 
tory to be significant predictors of diversity (p < 
0.05), and traits within the former three cat- 
egories remaining so after accounting for multi- 
ple testing (Benjamini-Hochberg correction, false 
discovery rate = 0.05). Species organized in 
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single-male polygynous mating systems show 
lower diversity than the background (7’ prea = 
O11, Deor = 1.53 x 10°”), consistent with expec- 
tations of reduced contribution of allelic diversity 
from males (63). Within the climatic niche, we 
observe a gradient of diversity declining from 
south to north (77 prea = 0.28, Door = 145 x 10°), 
which is driven by highly diverse lemur species 
in the Southern Hemisphere. We also find a 
significant correlation with mean temperature 
and amount of precipitation Goes = 0.33, 
Deorr = 1.97 x 10 *). It is worth noting that these 
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measurements are not highly correlated with 
each other (Pearson’s 7 —0.27 to 0.17), and the 
relationships are thus at least partly indepen- 
dent. Lastly, within the activity budget, we 
find the amount of time spent socializing to be 
correlated with diversity (77 prea = 0-11, Deorr = 
5.56 x 10°). However, we caution that the 
measurement of activity budget is difficult to 
standardize across species, and interpreting 
this relationship is thus challenging. We find 
no significant impact of life-history traits such 
as body mass or longevity on genetic diversity 
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Fig. 4. Estimates of mutation rates and effective population size. 

(A) Distribution of estimates of the per-generation mutation rate across primate 
families (u). Large solid circles denote median, and horizontal bars denote 

the interquartile range. The gray line denotes the primate-wide median. 

(B) Distribution of Ne estimates across primate families. Species with effective 
population size above 3 x 10° are highlighted. (C) Comparison of pedigree-based 
estimates of u for great apes (79, 80), olive baboon (&1), rhesus macaque (82), 
and common marmoset (83) show a high correlation between the two estimates 
(Spearman's r = 0.85, p = 0.02). The open circle denotes the estimate for the 


within primates, although body mass is sig- 
nificant before accounting for multiple testing. 
These relationships have been previously de- 
scribed, albeit for broader evolutionary dis- 
tances, including a wider range of genetic 
diversity and body mass (64, 65). We addi- 
tionally calculated the relationship of the traits 
above to our mutation rate estimates. After 
correcting for multiple testing, we did not find 
any significant predictors of u. 


Variants specific to the human lineage 


Finally, we revisited a previously published 
catalog of 647 high-frequency human-specific 
missense changes, i.e, amino acid-altering 
variants that putatively emerged specifically 
in the human lineage and quickly rose to high 
frequency or fixation (66). This catalog was 
mainly defined by looking at derived sites seg- 
regating at high frequency in anatomically 
modern humans, at which archaic hominins 
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(Neanderthals and Denisovans) carry the an- 
cestral allele. Although insufficient to explain 
the whole spectrum of human uniqueness, 
such a catalog should contain prime candi- 
dates for some of its molecular underpin- 
nings. We sought to determine how often the 
putatively human-specific derived allele occurs 
at orthologous positions across the genomes of 
other primate species analyzed in this study. 
We find 63% (406) of high-frequency human- 
specific missense changes to occur in at least 
one other primate species and 55% in more 
than two, segregating at high frequency (>0.9) 
within the sampled individuals of a species 
(Fig. 5). This suggests that mutational recur- 
rence generally might be widespread across 
primates. We find mutation pairs in recurrent 
high-frequency human-specific missense changes 
enriched in T-C and A-G mutations, and to a 
lesser extent in C-T and G-A compared with 
nonrecurrent ones. 
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mouse lemur (84), which was excluded from the comparison as an outlier 
(16). Data for trio estimates were derived from (85). (D) Positive correlation 
between estimates of per-generation mutation rates and generation times (g) 
(Pearson's r = 0.53, p = 2.1 x 10°”). (E) Inverse relationship between yearly 
mutation rate and generation time. Circles in (D) and (E) are colored by the 
effective population size N. (Pearson's r = -0.34, p = 3.1 x 10°’). (F) Relationship 
between per-generation mutation rate, adjusted by first regressing the effects 
of generation time, and effective population size. The relationship is highly 
significant after phylogenetic correction (r* = 0.45, p < 0.001). 


We leveraged our data to generate a more 
stringent picture of the mutations that arose 
specifically in the human lineage and have not 
emerged elsewhere in primates. We identified 
alleles present in anatomically modern humans 
at a frequency of at least 99.9% that differ in 
state from a set of four high-coverage archaic 
hominins genomes (67-70). We ensured that 
the human allele represents the derived state 
by requiring the ancestral allele to be present 
at a frequency of >99% in a genetic diversity 
panel of 139 previously published great ape 
genomes (8, 9, 71, 72). The resulting 24,374 can- 
didates include a conservative set of 124 mis- 
sense coding mutations affecting 107 different 
genes, among which are 17 previously unde- 
scribed changes affecting 12 genes (66). 

We further sought to detect which genes 
have not shown frequent allele recurrence in 
other primate species. To this end, we removed 
variants that we found to reoccur in >1% of 
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Fig. 5. Recurrent putative high-frequency 
human-specific missense changes. Each bar on 
the x axis represents a high-frequency human- 
specific missense change with the same allele found 
in a different species. Color schemes are the same 
as presented in Figs. 1 and 2. 


species at a frequency of >0.1%. In this set, we 
find 89 missense changes, affecting 80 dis- 
tinct genes. We observe no enrichment for 
functional categories or association to diseases 
among them. Within our catalog, we also find 
the two amino acid differences with demon- 
strated functional differences between hu- 
mans and Neanderthals: The ancestral allele 
in NOVAI (neuro-oncological ventral antigen 
1) leads to a slower development of cortical 
organoids and modifies synaptic protein in- 
teractions (73); the human-derived allele of the 
adenylosuccinate lyase gene (ADSL) leads to a 
reduced de novo synthesis of purines in the 
brain (74). Furthermore, changes in mitotic 
spindle-associated genes previously reported 
to be under positive selection (SPAGS5, KIFI8A) 
maintain their status as distinctively human 
(75). This may have had an impact on neu- 
rogenesis during development (76), although 
this hypothesis has not been experimentally 
validated. We find a specifically human change 
in 7MPRSS2, a main factor in the response to 
severe acute respiratory syndrome coronavirus 
2 (SARS-CoV-2) infection with known func- 
tional variants that have possibly been under 
selection in some human populations (77). 
Analogous to the above, we additionally gen- 
erated a catalog of sites that are fixed across 
great apes but differ from rhesus macaque 
(Macaca mulatta). Among these 11.2 million 
variants, we find 1 million without observed 
recurrences beyond apes, corresponding to 
mutations specific to the great ape lineage. 
These contain 3792 missense variants affect- 
ing 2970 different genes that are significantly 
enriched for multiple cilia-related functional 
categories, such as axoneme assembly, motile 
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cilium assembly, nonmotile cilium assembly, 
cilium-dependent cell motility, and epithe- 
lial cilium movement involved in extracellular 
fluid movement, suggesting that the evolution 
of ape-specific features of cilia have been impor- 
tant in shaping the lineage leading to our own 
species. The disruption of normally function- 
ing cilia can lead to an array of heterogeneous 
pathologies in humans, collectively known as 
ciliopathies. Among 187 genes with established 
links to different ciliopathies, we find 30% to 
be affected by ape-specific missense changes 
(78) (p < 0.01, Fisher’s exact test). More gen- 
erally, we also find an overall significant enrich- 
ment of genes with nonrecurrent ape-specific 
missense changes among genes with disease 
association in OMIM (Online Mendelian In- 
heritance in Man) (p < 0.01, Fisher’s exact test), 
suggesting that—to some degree—variants that 
give rise to the ape-specific phenotype, and 
thus ultimately also to the human one, affect 
a greater proportion of the genes that make 
us susceptible to diseases than would be ex- 
pected by chance. 
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Phylogenomic analyses provide insights into 


primate evolution 
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Chun-Yan Chen®, Xupeng Bi’, Xiao-Lin Zhuang’®, Hong-Liang Zhu’, Jiang Hu’, Zongyi Sun’°, 
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Comparative analysis of primate genomes within a phylogenetic context is essential for understanding 
the evolution of human genetic architecture and primate diversity. We present such a study of 50 
primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with 
many from previously less well represented groups, the New World monkeys and the Strepsirrhini. 
Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate 
lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, 
skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our 
study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may 
have had an impact on the adaptive radiation of the Simiiformes and human evolution. 


he order Primate contains >500 species 

from 79 genera and 16 families (J), with 

new species continuing to be discovered 

(2-5), making primates the third most 

speciose order of living mammals after 
bats (Chiroptera) and rodents (Rodentia). As 
our closest living relatives, nonhuman primates 
play important roles in the cultures and reli- 
gions of human societies (J). Many nonhuman 
primate species have been widely used as ani- 
mal models because of their genetic, physiolog- 
ical, and anatomical similarities to humans, 
allowing the efficacy and safety of newly devel- 
oped drugs and vaccines to be tested (6). For 
example, since the emergence of COVID-19, 
macaques have served as important models in 
the research and development of vaccines (7-16). 
Primates display considerable morphological, 
behavioral, and physiological diversity and 


hold the key to understanding the evolution 
of our own species, particularly the evolution 
of human phenotypes such as high-level cog- 
nition (17, 18). 

Nonhuman primates occupy a wide range 
of diverse habitats in the tropical forest, savanna, 
semidesert, and subtropical regions of Asia, 
Central and South America, and Africa, and hu- 
mans have spread across much of the earth’s 
surface. Nevertheless, according to the Interna- 
tional Union for Conservation of Nature (IUCN) 
Red Lists, >33% of primate species are critically 
endangered or vulnerable, ~60% are threatened 
with extinction, and ~75% are experiencing 
population decline (J). With global climate 
change and increasing anthropogenic inter- 
ference, the conservation status of primates 
has attracted global scientific and public 
awareness. 


Despite the importance of nonhuman ches 
mates, reference genomes have been sequel.--— 
in <10% of species (19-27), which both impedes 
research and hampers conservation efforts. 
Here, we present high-quality reference ge- 
nomes for 27 primate species with long-read 
sequencing generated from our first-phase pro- 
gram of the Primate Genome Project. 


Assembly and annotation of 27 new primate 
reference genomes 


We applied long-read genome-sequencing 
technologies, including Pacbio and Nanopore, 
to sequence the genomes of 27 nonhuman 
primate species from 26 genera of 11 families 
(table S1). Long reads were self-polished and 
assembled, and the genome assemblies were 
further corrected and polished by paired-end 
short reads sequenced from the same individ- 
uals (tables S2 to S4). We also used sequencing 
data generated by high-throughput chromo- 
some conformation capture technology (28) 
to anchor assembled contigs into chromosomes 
for four species (fig. S1 and table S4). The sizes 
of the new genome assemblies of the primate 
species under study ranged from ~2.4 x 10° base 
pairs (Gbp) (Daubentonia madagascariensis) 
to ~3.1 Gbp (Erythrocebus patas), which were 
mostly consistent with the k-mer-based es- 
imations (fig. S2 and table S5), with a high 
average contig N50 length of ~15.9 x 10° 
base pairs (Mbp) (table S6). All ofthe genome ‘ 
assemblies yielded BUSCO complete scores 
>92% (table S6). A method that integrates 
de novo and homology-based strategies was 
applied to annotate all genomes with protein 
sequences from human, chimpanzee, gorilla, 
orangutan, and mouse as references for homology- 
based gene model prediction. Between 20,066 
and 21,468 protein-coding genes were predicted 
in these genome assemblies (table $7). Further, 
we also identified ~24.2 Mbp of primate-specific « 
highly conserved elements by using whole- ‘ 
genome alignments between all primates and 
nine other mammals (fig. S3). 
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The Primate Genome Project also generated 
high-quality reference genomes for another 
16 primate species that were used in the accom- 
panying papers to reveal hybrid speciation 
during the rapid radiation of the macaques 
(29), the homoploid hybrid speciation in the 
snub-nosed monkey Rhinopithecus genus (30), 
social evolution in the Asian colobines driven 
by cold adaptation (37), and the evolutionary 
adaptations of slow lorises (32). All genomic 
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data have been published openly and can be 
freely accessed in the National Center for Bio- 
technology Information (NCBI) Assembly 
Database under the accession information de- 
scribed in this study. 


A genomic phylogeny of living primates 


We next performed phylogenomic analyses 
comprising the 27 newly generated genomes, 
another 22 published primate genomes, one 
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long-read genome from Nycticebus pygmaeus 
reported in an accompanying paper (32), and 
two close relatives of primates, the Sunda fly- 
ing lemur (Galeopterus variegatus) and the Chi- 
nese tree shrew (Tupaia belangeri chinensis) 
(33), aS outgroups (table S8). We constructed 
whole-genome-wide phylogenetic trees using 
ExaML under a GTR+GAMMA model (34). 
Altogether, ~433.5 Mbp of gap-free data for 
syntenic orthologous sequences were retrieved 
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Fig. 1. Genomic phylogeny of primates. The maximum likelihood method 

was used to infer the primate species tree from whole-genome sequences 
across 52 species, including 50 primate species and two outgroup species 

(the Sunda flying lemur and the Chinese tree shrew) with 100 bootstraps under a 
GTR+GAMMA model. The divergence time was estimated using fossil calibrations 
(fig. S11) and the MCMCtree algorithm. The yellow and blue species names represent 
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Stephen D. Nash/IUCN/SSC Primate Specialist Group and are used in this 
study with their permission. 
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Fig. 2. Reconstruction of primate ancestral chromosomes. (A) Chromosome evolution patterns from the primate common ancestral lineage leading to the 
human lineage. Chromosomes are colored on the basis of human homologies. (B) Karyotype evolution and genome rearrangement. The rates of genomic 
rearrangement are highlighted in black bold font. Chromosome variations from ancestral nodes to derived branches are shown by pathways including chromosome 
reversal, translocation, and fission and fusion events, which are shown by number, e.g., reversal, translocation, fission, and fusion. “HYLPIL” represents the 

gibbon Hylobates pileatus, the genome of which was assembled at the chromosome level. 


from the whole-genome alignments (table S9) 
and used to infer the primate phylogeny, 
yielding a high-resolution whole-genome 
nucleotide evidence tree with identical topology 
to a previous tree derived from 54 nuclear gene 
regions from 186 living primates (35). This 
tree has 100% bootstrap support for all evo- 
lutionary nodes, with the exception of the 
node ((Symphalangus syndactylus, Hoolock 
leuconedys), Hylobates pileatus) among gib- 
bon genera with 90% bootstrap support (Fig. 
land figs. S4.and S5). The evolution of gibbons 
has been characterized by their rapid karyo- 
typic changes and remains controversial in 
primate phylogeny at the genus level (24, 35, 36). 
To confirm the phylogeny of this node, we also 
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generated partitioned trees with orthologous 
protein-coding genes, exon codons with first 
and second positions, fourfold degenerate sites, 
and conserved nonexonic elements (figs. S6 to 
S9). The tree from conserved nonexonic 
elements yielded the identical topologies for 
the gibbon lineages with the whole-genome 
nucleotide evidence trees (fig. S9). However, 
the trees from orthologous protein-coding genes 
and exon codons with first and second positions 
and fourfold degenerate sites, respectively, 
supported the alternative topologies, ((Nomas- 
cus, Hylobates), (Symphalangus, Hoolock)) 
and ((Nomascus, (Symphalangus, Hoolock)), 
Hylobates) (figs. S6 to S8). The two topologies 
were shown in previous studies based on var- 


iants called by mapping short reads to the ref- 
erence genome of Nomascus leucogenys (24, 36). 

Our analyses again confirmed the phyloge- 
netic challenge within the gibbon lineage, 
which has experienced pronounced adaptive 
radiation within an extremely short evolu- 
tionary time period (24, 35). Consistently, we 
observed extremely short internal branches 
in this lineage on the phylogeny. A compara- 
tive analysis using CoalHMM (37) across pri- 
mate lineages showed that the gibbon lineage 
represents one of the lineages with the highest 
frequency of incomplete lineage sorting (38), 
supporting a previous study based on popu- 
lation data (24). Specifically, the two gibbon 
branches showed incomplete lineage-sorting 
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proportions of 57 and 61%, respectively, but 
the species topology inferred from incomplete 
lineage-sorting analyses was identical to those 
presented herein (figs. S4 and S10). 

Using the whole-genome nucleotide evi- 
dence tree and fossil calibration data (35, 39) 
(Fig. 1 and fig. S11), the divergence dating of 
living primates was estimated by means of the 
MCMCtree algorithm (40) (Fig. 1 and fig. S12). 
We estimated that the most recent common 
ancestor of all primates evolved between 64.95 
and 68.29 million years (Ma) ago, which is close 
to the estimate given in the latest phylogenetic 
study across mammals (41), suggesting that 
the origin of the primate group was near the 
Cretaceous-Tertiary boundary at 66 Ma ago. 
We also estimated that the most recent com- 
mon ancestor of Strepsirrhini appeared between 
52.57 and 56.56 Ma ago, and that of the 
Simiiformes emerged between 35.65 and 
42.55 Ma ago (Fig. 1 and fig. S12). 


Genomic structure and evolution of primates 
Karyotype evolution and genome rearrangement 


The speciation process is often accompanied 
by karyotypic evolution, which also affects ge- 
nome evolution and gene function (42-44). 
We reconstructed the ancestral karyotype evolu- 
tionary process across primate lineages (table 
S10) and observed an overall conserved pat- 
tern of chromosome-level synteny (Fig. 2A). The 
numbers of ancestral karyotypes of Catarrhini 
(2n = 46) and Hominoidea (272 = 48) were con- 
sistent with previous inferences derived from 
the fluorescence in situ hybridization data of 
bacterial artificial chromosomes (45) (Fig. 2A). 
However, we deduced that both of the ances- 
tral karyotypes of primates and Simiiformes 
had a diploid number of 27 = 52 (Fig. 2A) 
rather than 2n = 50 as previously suggested 
(45), recovering a fission event in chromosome 
8 that was observed in the common ancestor 
of primates (Fig. 2A and fig. S13). Fusion and 
fission are the most common mechanisms of 
karyotype evolution in primates, as exemplified 
by the fusion of chromosome 2, which occurred 
specifically in the human lineage (45). Our 
analyses further identified at least one fission 
and one fusion during the emergence of the 
Simiiformes, as well as one fission and four 
fusions associated with the Catarrhini node 
(Fig. 2B and fig. S13), resulting in the con- 
temporary karyotype structure of our own. The 
rapid change of karyotypes in the Simiiformes 
also led to an increased chromosome number 
in New World monkeys, which have the largest 
number of chromosomes across primates. 
We further estimated the rate of genome re- 
arrangement by taking into account all large- 
scale genomic rearrangement events, including 
reversions, translocations, fusions, and fissions, 
in key evolutionary nodes from the primate 
common ancestral lineage leading to the human 
lineage. We observed an increasing rate of 
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rearrangement in the Homininae (Gorilla- 
Homo-Pan) (~2.38/Ma) and particularly in 
the Hominini (Homo-Pan) (~5.56/Ma) (Fig. 
2B), which contradicts the Hominini slow- 
down hypothesis on the nucleotide substitu- 
tion rates (35). 


Lineage-specific segmental duplication 


We next compiled segmental duplication 
maps (segmental duplication length =5 kbp) 
for primates and five outgroup species (fig. 
$14 and table S11). Compared with other pri- 
mate lineages, we observed a marked increase 
in the number of lineage-specific segmen- 
tal duplications (m = 221) in the great ape 
genomes (Fig. 3A and table S12), consistent 
with previous findings describing a burst 
of segmental duplications in the great ape 
ancestor (46). These specific segmental du- 
plications in great apes overlapped with 57 
protein-coding genes (table S13), 20 of which 
were highly expressed in the human brain 
(fig. S15). We also observed lineage-specific 
segmental duplications in other primate groups 
producing lineage-specific new genes that 
might have contributed to the evolution of 
these lineages (table S13). We further ex- 
plored the functions of all genes overlapping 
segmental duplications in primate genomes 
(table S13) against the Human Gene Muta- 
tion Database (47), and found that a high 
proportion of these genes (52.8%) have been 
reported to be associated with inherited con- 
ditions including autism, intellectual dis- 
abilities, and other developmental disorders 
(Fig. 3B and table S14). 


Evolution of genome size and transposable elements 


Compared with other mammalian groups, the 
primates on average have a relatively large 
genome size (48, 49). Among primates, the 
lemurs (Lemuriformes and Chiromyiformes) 
were found to be characterized by a signif- 
icantly smaller genome size (~2.36 Gbp) than 
other groups such as the lorisoids (Lorisiformes: 
Lorisdae and Galagidae, ~2.70 Gbp), New 
World monkeys (~2.82 Gbp), Old World monkeys 
(~2.91 Gbp), and Hominoidea (~2.96 Gbp) 
(P < 0.05, Mann-Whitney U test) (fig. S16). 
The increase of genome size in the Simiiformes 
can be attributed to the expansion of trans- 
posable elements (figs. S16 to S18 and table S15), 
especially Alu elements, ~300 nucleotide short 
interspersed sequence elements (SINEs) that 
make up ~11% of the human genome (50-54). 
We observed that the genomes of lemurs ex- 
hibited a relative paucity of SINEs, especially 
Alu (~3.87%), which is less than one-third of 
the proportion noted in other lineages (figs. 
S16 to S18). By contrast, the Alu elements in both 
Simiiformes and Lorisiformes experienced 
major bursts of retrotranspositional activity 
at ~40 to 45 and ~34 to 39 Ma ago indepen- 
dently (fig. S19). Specifically, we noticed a 


substantial expansion of the AluS-related 
subclasses, especially AluSz in the Simiiformes, 
whereas the AlwJ-related subclasses (especially 
AluJb) were the dominant subclasses of Alu in 
the Lorisiformes (fig. S20). 


Variation in the nucleotide substitution rate 


We estimated the overall nucleotide substi- 
tution rate in primates to be ~1.1 x 10° sub- 
stitutions per site per million years (Fig. 3C, 
fig. S21, and table S16), which is much lower 
than the average rate for mammals (~2.7 x 
10-) and birds (~1.9 x 107°) (55). However, 
the nucleotide substitution rate exhibited a 
high degree of heterogeneity between pri- 
mate lineages, potentially caused by differ- 
ences with respect to life history traits (56-58). 
The New World monkeys evolved the fastest at 
~1.4 x 10° substitutions per site per million 
years (Fig. 3C and fig. S21). We confirmed the 
hominoid “slowdown” (35, 59-61) hypothesis 
by detecting a reduced substitution rate in 
hominoids (~0.8 x 10~° substitutions per site 
per million years) (fig. $21). Our analysis and a 
previous study (62) suggested that tarsiers, as 
the most basal haplorrhines, potentially 
evolved with a rapid substitution rate com- 
pared with other primates (fig. S21). 


Evolution of protein-coding genes 


We obtained a high-confidence orthologous 
gene set comprising 10,185 orthologs across 
50 primate species, along with the Sunda flying 
lemur and the Chinese tree shrew. On the basis 
of the whole-genome nucleotide evidence tree 
topology of primates, we calculated the ratio of 
the rates of nonsynonymous (dy) to synonymous 
(dg) substitutions for each ortholog to explore 
the evolutionary constraints operating on 
coding regions. We estimated the evolutionary 
rate of tissue-specific expressed genes for 
different tissues across evolutionary clades in 
primates based on the observation that tissue- 
specific expressed genes are generally conserved 
across diverse species (63, 64), and observed 
that testis- and spleen-specific expressed genes 
generally displayed higher values of dy/ds 
(Fig. 3D and figs. S22 and S23) than other 
tissue-specific expressed genes, corroborat- 
ing the rapid evolution of the reproductive and 
immune systems in primates (65, 66). By con- 
trast, brain-specific expressed genes general- 
ly showed a high degree of conservation with 
lower dy/ds values, as previously reported, de- 
spite the rapid evolution of primate cognitive 
functions (67). 

Next, we detected 82 positively selected 
genes in the common ancestral lineage of pri- 
mates by comparison with other mammalian 
species (table S17) using the codeml algorithm 
under the branch-site model with a likelihood 
rate test in PAML4 (40, 68). We found that 
these positively selected genes were signif- 
icantly enriched in genes exhibiting high-level 
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expression in brain, bone marrow, and testis 
(table S18). In particular, close to 37% (30 genes) 
of positively selected genes exhibited biased 
expression in the brain (tables S18 and S19), 
and we found that some of them (e.g., SPTAN1, 
MYTIL, and SHMT]1) could have important 
roles in brain function, because deleterious 
mutations of these genes have been reported to 
cause brain disorders (69-71) such as epilepsy 
and schizophrenia. These genes may be impor- 
tant candidates for involvement in the evolu- 
tion of the primate brain because of their 
functional importance. Our results suggest that 
some positively selected genes in the primate 
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ancestral lineage may have been involved in the 
rapid evolution of their brain functions de- 
spite the general conservation of brain-specific 
expressed genes. In addition, several immune- 
related genes (e.g., XRCC6 and CD2) (table S17) 
also experienced positive selection in the pri- 
mate ancestor, suggesting that the adaptive 
immune system might also have contributed 
to primate evolution. 


An increased level of genomic change in the 
ancestor of the Simiiformes 


To provide new insights into the genetic 
underpinnings of primate phenotypic evolu- 
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Fig. 3. Structural evolution in primate genomes. (A) Evolutionary pattern 

of lineage-specific segmental duplications in primates. The numbers of 
lineage-specific segmental duplications are shown in red. The largest number 
of segmental duplications was found in the great ape lineage. OWMs, Old World 
monkeys; NWMs, New World monkeys. (B) Example of specific segmental 
duplications during evolution of the genome in Catarrhini. A gene pair 
overlapping the segmental duplication (left, CCL4; right, CCL4L2) is associated 
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primates. 


tion, we performed various comparative ge- 
nomic analyses, including the identification of 
positively selected genes, genes having con- 
served noncoding regions that have been 
subject to lineage-specific accelerated evolu- 
tion (72), and expanded gene families in different 
primate lineages (68). An increased level of 
genomic evolutionary changes, as reflected by 
the high numbers of positively selected genes, 
lineage-specific accelerated regions, and ex- 
panded gene families, was observed in the 
Simiiformes ancestor (Fig. 4A). Consistently, 
the Simiiformes have also experienced rapid 
evolution of a series of complex traits, unlike 
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with HIV susceptibility. The red and green boxes represent the 

segmental duplication region and the overlapping gene pair, respectively. 
(C) Substitution rates across five evolutionary branches in primates. 

(D) Evolutionary constraints of tissues across diverse lineages in primates. 
The evolutionary constraints of tissues are shown by the dy/ds median of 
tissue-specific expressed genes in different evolutionary nodes among 
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the Strepsirrhini and Tarsiiformes. For exam- 
ple, the Simiiformes generally exhibit a larger 
brain volume and body mass than the Strepsir- 
rhini and Tarsiiformes (Fig. 4B) (73, 74). Func- 
tional enrichment analyses showed that the 
associated genes relevant to these rapid genomic 
changes in the Simiiformes ancestor (tables S20 
to S22) were overrepresented in functions re- 
lated to the nervous system and development, 
such as postsynaptic density, synapses, and the 
negative regulation of the canonical Wnt signal- 
ing pathway (table S23). 

Additional analyses indicated that various 
candidate genes in the Simiiformes ancestral 
lineage, comprising 168 positively selected genes, 
273 genes associated with lineage-specific ac- 
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celerated regions, and 14 expanded gene fam- 
ilies, were enriched in central nervous system 
terms, i.e., brain, cerebrum, cerebellum, hippo- 
campus, and cerebral cortex (table S24). More 
specifically, five genes participated in path- 
way axon guidance (Fig. 4C), being expressed 
in the human brain at a high level (table S25). 
Axon guidance represents a key stage in the 
formation of a neural network (75, 76) and may 
have been an important influence on brain 
volume. In this pathway, two semaphorin 
genes, SEMA3B and SEMA3D, which are crit- 
ical for central nervous system patterning 
(77, 78), experienced positive selection and 
served as a gene associated with the lineage- 
specific accelerated region, respectively. These 


two genes, together with another three genes 
associated with the lineage-specific acceler- 
ated regions, EPHA3, RACI, and NTNG2, are 
known to be important for brain development 
(79-81). Furthermore, eight genes were as- 
signed under the term “Hippo signaling path- 
way” (Fig. 4D), an evolutionarily conserved 
signaling pathway that controls organ or body 
size by regulating cell growth, proliferation, 
and apoptosis in a range of animals from flies 
to humans (82-84). Genes involved in neuronal 
network formation and the control of organ 
size appear to have undergone adaptive evo- 
lution in the Simiiformes ancestral lineage and 
may have been responsible for specific pheno- 
typic changes, particularly the progressive 


Fig. 4. Genomic changes and phenotype evolution in the ancestor of the 
Simiiformes. (A) Increased level of genomic evolutionary change, including 
positively selected genes, lineage-specific accelerated regions, and significantly 
expanded gene families, seen in the Simiiformes ancestral lineage. The brain 
sizes and brain structures are shown in representative evolutionary groups of 
primates. The brain sizes across primate and outgroup species are derived from 
previous studies (156, 157). Brain images are from the Michigan State University 
Comparative Mammalian Brain Collections (www.brainmuseum.org). (B) Repre- 
sentative phenotype variations, including brain size and body mass, between the 
Strepsirrhini and Tarsiiformes and the Simiiformes. Statistical significance was 
assessed by the Mann-Whitney U test as P < 0.05. (C) Candidate genes involved 
in the axon guidance KEGG pathway (hsa04360). Genes relating to genomic 
changes in the Simiiformes ancestral lineage are shown in this pathway. The 
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protein product of the positively selected gene in the Simiiformes ancestral 
lineage, SEMA3B, is shown in red. The protein products of genes associated with 
lineage-specific accelerated regions, EPHA3, RACI, NTNG2, and SEMA3D, are 
shown in blue. (D) The Hippo signaling pathway (hsa04390), which is involved in 
organ size and body size, with candidates including positively selected genes 
and genes associated with lineage-specific accelerated regions. The gene 
products for positively selected genes (LIMD1, BIRC3, and STK3) in the 
Simiiformes ancestral lineage are shown in red, and the products of genes 
associated with lineage-specific accelerated regions (PATJ, SOX2, BMP2, DLG2, 
and YWHAQ) in the Simiiformes ancestral lineage are shown in blue. (E) Multiple 
sequence alignments of two positively selected genes, TASIRI and KIT, along the 
Simiiformes ancestral lineage. The phylogenetic position of the Simiiformes 
ancestor is indicated by a red arrow. 
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increase in brain volumes and body sizes com- 
pared with the Tarsiiformes and Strepsirrhini. 

A major phenotypic difference between 
the Strepsirrhini and Tarsiiformes and the 
Simiiformes is nocturnal versus diurnal life 
history. The visual system has diverged sub- 
stantially between the Strepsirrhini and 
Tarsiiformes and the Simiiformes such that 
the diurnal Simiiformes have much smaller 
corneal sizes (relative to their eyes) and higher 
visual acuity than the Strepsirrhini and Tarsi- 
iformes (85). Consistent with this phenotypic 
difference, we detected positive selection signals 
in three genes, NPHP4, GRHL2, and SLC39A5, 
which are associated with eye development 
(Gene Ontology identifier: 0001654) in the 
Simiiformes ancestral lineage. An intragenic 
deletion in NPHP4 causes recessive cone-rod 
dystrophy with a predominant loss of cone 
function in the dachshund (86). GRHL2 encodes 
a transcription factor that suppresses epithelial- 
to-mesenchymal transition; ectopic GRHL2 
expression caused by mutation accelerates cell 
state transition and leads to posterior polymor- 
phous corneal dystrophy and vision function 
disruption (87). The GRHL2 gene has the highest 
number of positively selected sites in the 
Simiiformes ancestor compared with the other 
genes involved in eye development (fig. S24). 
TASIRI encodes a taste receptor that can form 
a heterodimer with TASIR3 to elicit the umami 
taste (88). We found that TASIR/ also expe- 
rienced positive selection with four positively 
selected sites in the Simiiformes ancestor (Fig. 
4E). The rapid and concerted evolution of 
taste receptors and vision could have helped 
the diurnal Simiiformes to locate and identify 
food. The detailed functional consequences of 
these amino acid changes might be worthy of 
further study. 

Compared with the Strepsirrhini and Tar- 
siiformes, the Simiiformes generally exhibit 
darker skin pigmentation and a less bright 
coat color (fig. S25) (89). We identified two 
pigmentation-related genes, KIT and CREB3L4, 
that participate in the melanogenesis pathway 
that evolved under positive selection (detected 
by the branch-site model) in the Simiiformes 
ancestor (Fig. 4E). Melanocytes play an im- 
portant role during the formation of skin 
and coat colors in mammals by regulating 
melanin-related genes (90). KIT, a proto-oncogene, 
encodes a receptor tyrosine kinase that reg- 
ulates cell migration, proliferation, and differ- 
entiation in melanocytes and plays a key role 
in melanin deposition (91, 92). KIT also com- 
municates with MITF, a key gene in the forma- 
tion of melanin that regulates the development 
of melanocytes (93-95). 


Genetic mechanisms underlying primate 
phenotype evolution 


Primates have evolved diverse phenotypic 
traits to adapt to their challenging environ- 
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ments. Here, we sought to investigate the 
evolution of complex phenotypes in the brain, 
skeletal system, digestive system, and sense 
organs, as well as body size, in primates. 


Brain evolution 


In primates, brain volumes range from <~2 cm? 
in the mouse lemur to ~1300 cm? in human (73). 
To reveal the genetic changes that might under- 
lie brain evolution in primates, we detected 
signals of positive selection in brain develop- 
ment genes using a branch-site model in PAML 
in key evolutionary nodes in the primate phylo- 
geny. A total of 34 brain genes were found to be 
under positive selection in one of the primate 
evolutionary nodes (table S26) (68). Four of 
them, SLC6A4, NR2EI, NIPBL, and XRCC6, 
were under positive selection in the common 
ancestor of all primates, whereas 30 were under 
positive selection in other primate ancestral 
nodes leading to the evolution of humans 
(table S26). These results appear to suggest 
that primates underwent continuous brain 
evolution over an extended period of evolu- 
tionary time. Knockout experiments in mice 
on many of these positively selected genes have 
shown brain function impairment. For instance, 
the NIPBL gene interacts with ZFP609 to 
regulate the migration of cortical neurons, and 
its mutations are frequently involved in brain 
neurological defects encompassing intellec- 
tual disability and seizures (96). We identified 
two amino acid residues in the NIPBL protein 
that experienced adaptive change in the com- 
mon ancestor of all primate lineages (fig. S26). 

Microcephaly is characterized by severe 
neurological defects, the small brain size being 
caused by a disturbance of the proliferation of 
nerve cells (97). Some genes involved in micro- 
cephaly have been proposed as candidates for 
involvement in the evolution of brain size 
(98-100). We also searched for positive selec- 
tion signals in the 1113 coding genes involved in 
microcephaly (g:Profiler identifier HP:0000252). 
In total, 65 positively selected genes with 
functional roles in microcephaly were iden- 
tified, along with the primate ancestor leading 
to the human lineage (table S27), suggesting 
that microcephaly genes may have been in- 
volved in the marked evolutionary expansion 
of brain size that characterizes primates, es- 
pecially in those crucial evolutionary nodes 
characterized by a sharp increase in the de- 
gree of cortical folding (gyrification) and brain 
volume (J01). 

We next sought to investigate the roles of 
regulatory elements in the evolution of pri- 
mate brain size. We first identified noncoding 
regions that were highly conserved and under 
strong purifying selection across all primates 
and detected signals of accelerated evolution 
in four lineages: the Simiiformes ancestor (table 
$21), the Catarrhini ancestor (table S28), the 
ancestor of great apes (table S29), and the 


human lineage (table S30), representing cru- 
cial evolutionary nodes for the enlargement 
of primate brain size (101) (fig. S27). These 
lineage-specific accelerated regions should 
be under strong positive selection specifically 
in the targeted lineages and might contribute 
to the adaptation or innovation of these line- 
ages (72). We found 15 genes associated with 
lineage-specific accelerated regions in the com- 
mon ancestor of the great apes that showed 
particularly high expression in the human 
fetal brain (fig. S27 and table S31) (P = 0.023, 
modified Fisher’s exact test). More than half of 
these genes have been reported to have roles 
in brain development and function (102-109). 
For example, knockout of the transcription 
factor-encoding MEF2C in a mouse model 
resulted in impaired neuronal differentiation 
and smaller somal size among neural progenitor 
cells (108). Coincidentally, the lineage-specific 
accelerated region of this gene was detected in 
the great ape ancestral lineage. The DLG5 
gene, which is required for the polarization 
of citron kinase in mitotic neural precursors, 
also contains a lineage-specific accelerated 
region in the great ape lineage, and DLG5~ 
mice have smaller brains and thinner neo- 
cortices (109, 110). 

We further investigated the evolution of 
neurotransmitters, which mediate the neuro- 
genesis process (J1], 112) and also play a role 
in the regulation of brain size (111). We de- 
tected 12 positively selected genes and 39 genes 
associated with lineage-specific accelerated re- 
gions in the ancestral nodes leading to the hu- 
man lineage that were found to be involved in 
the release, transportation, and reception of 
neurotransmitter signals (Fig. 5A and fig. S28). 
These genes participate in diverse neuro- 
transmitter systems: glutamatergic, dopamin- 
ergic, cholinergic, and GABAergic synapses 
and the synaptic vesicle cycle. Among these, 
five positively selected genes and 33 genes 
associated with lineage-specific accelerated 
regions are highly expressed in the human brain 
(table S32). It is likely that at least some of 
these genomic changes affecting the neuro- 
transmitter signaling pathway might have 
played a role in primate brain evolution. 


Evolution of the skeletal system and limbs 


The arboreal lifestyle coevolved with adaptive 
changes of the skeletal system and limb devel- 
opment. Genes functioning in bone develop- 
ment are likely to have been especially important 
for the adaptive radiation of the primates. We 
identified four positively selected genes, PIEZO1, 
EGFR, BMPER, and NOTCH2, that were involved 
in bone development (173-116) in the ancestral 
lineage of primates (table S17). Bone develop- 
ment requires the recruitment of osteoclast 
precursors from the surrounding mesenchyme, 
thereby actuating the key events of bone growth, 
such as marrow cavity formation, capillary 
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Fig. 5. Associations between genomic evolutionary characteristics and 
phenotypic traits in primates. (A) Positively selected genes and genes 
associated with lineage-specific accelerated regions from the primate ancestral 
lineage leading to the human lineage that are involved in transport, release, 
and receptors in neurotransmitter signaling. (B) The NEKI gene, which is involved 
in upper limb bone development, was under positive selection with three 
positively selected sites in the gibbon ancestral lineage. The gibbon ancestor is 


invasion, and matrix remodelling. The mechan- 
ical sensing protein PIEZO1 accommodates 
bone homeostasis through osteoclast-osteoblast 
cross-talk (113). Osteoclasts then influence oste- 
oblast formation and differentiation through the 
secretion of some soluble factors (117). EGFR 
negatively regulates mTOR signaling during 
osteoblast differentiation to control bone devel- 
opment (114). The NOTCH2 gene regulates 
cancellous bone volume and microarchitecture 
in osteoblast precursors (116, 118). 

Although tails vary in length and shape 
across the primates, they generally play key 
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roles in relation to locomotion (J19). This not- 
withstanding, the tail was lost in some primate 
lineages, including the common ancestor of 
the apes (120, 121). We retrieved 151 genes as- 
sociated with lineage-specific accelerated re- 
gions in the common ancestral lineage of 
the apes (table S33), including KIAA12/7 (sickle 
tail protein homolog) (figs. S29 and S30). 
Mutations in KIAA/2/7 are associated with 
malformations of the notochord and caudal 
vertebrae in humans, and in mice they affect 
the development of the vertebral column, lead- 
ing to a characteristic short tail due to a 


shown in red. (C) Eight positively selected genes and genes associated with 
lineage-specific accelerated regions from the great ape ancestral lineage involved 
in the TGF-B, Wnt, and Hippo signaling pathways. (D) Positively selected genes 
and genes associated with lineage-specific accelerated regions involved in the 
evolution of the digestive system in the Colobinae ancestral lineage. Genes 
marked in red and blue represent positively selected genes and genes associated 
with lineage-specific accelerated regions, respectively, in this lineage. 


reduced number of caudal vertebrae (122, 123). 
Thus, the lineage-specific accelerated region 
may serve as a regulator of the expression of 
KIAAI1217, because this lineage-specific acceler- 
ated region, residing in the vicinity of KIAA1217 
in the ape lineage, overlaps with the enhanc- 
er EH38E1455433 (pELS) (fig. S31). High- 
throughput chromosome conformation capture 
data (fig. S32) also showed that this lineage- 
specific accelerated region is located in the 
same topologically associated domain as 
KIAA1217, suggesting that they may physically 
interact with each other. Furthermore, the 
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lesser apes (gibbons) are of particular interest 
because of their dominant locomotor style, 
brachiation (124, 125). This locomotor adap- 
tation was accompanied by the acquisition of 
distinct morphological characteristics, partic- 
ularly the elongated forelimb, representing 
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one of the most intriguing phenotypic traits in 
gibbons that enables them to travel through 
the canopy at high speed (126). We found that 
positive selection has operated on four genes 
related to upper limb bone morphology in the 
gibbon ancestral lineage (table S34). Of these, 


NEKI, which encodes a serine or threonine 
kinase, contains the most positively selected 
sites (Fig. 5B). Functional studies have shown 
that genetic variants in this gene can influ- 
ence bone length and shorten the humerus 
and femur in humans (127, 128). Therefore, 
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Fig. 6. Demographic history of nonhuman primates. (A) Primate species 
grouped according to their biogeographic distribution (Africa, Asia, or South 
America). The plot shows the normalized demographic history of all species 
within each biogeographic region. The normalized N, was inferred by dividing the 
estimated value of N. for each species at each time point by its maximum 
value. Callithrix jacchus was removed from this analysis because the genome 
was derived from an inbred individual. The time period from 50,000 to 20,000 
years ago (late Pleistocene) is indicated by a gray background. (B) Correlation 
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analysis between nucleotide diversity and N, after phylogenetic corarection 
using the Ape library in R (http://ape-package.ird.fr/). Ne represents the median 
value of effective population size for each species 20,000 years ago. (C) Nearly 
half (n = 20) of all nonhuman primate species experienced a continual decline 
in Ne over the past 3 million years. These include the 13 critically endangered or 
endangered species shown in red. The IUCN Red List status is shown for each 
species in the inserted plot: CR, critically endangered; EN, endangered; VU, 
vulnerable; NT, near threatened; and LC, least concern. 
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positive selection acting on genes related to 
upper limb bone morphology may have been 
important in the acquisition of the elongated 
forelimb, a key adaptive trait for the unique 
brachiating locomotion style of gibbons. 


Evolution of body size in primates 


Like other mammalian groups (129, 130), 
extant primate species exhibit a large range of 
body sizes, from dwarf galagos and mouse 
lemurs (~60 to 70 g) at one end of the spectrum 
to male gorillas (>200 kg in some individuals) 
at the other (137). Thus, primate body size has 
experienced significant divergence, particu- 
larly for the great apes with their substantial 
enlargement in body size. We detected several 
positively selected genes in the common an- 
cestors of the great apes that might have con- 
tributed to the evolution of this trait. DUOX2 
encodes a protein involved in a critical step 
of thyroid hormone synthesis, and muta- 
tions in DUOX2 are known to cause decreased 
body size in mouse and panda (132, 133). This 
gene experienced strong positive selection 
in the great ape ancestral lineage (P = 0.018, y” 
test) (Fig. 5C and table S35). Additionally, we 
found several genes involved in the trans- 
forming growth factor-B8 (TGF-B) signaling 
pathway (e.g., LTBP1) or the Wnt signaling 
pathway (e.g., MBD2, YAPI, and DISC1), two of 
the best known pathways participating in bone 
development and body size (48), that were 
either under strong positive selection in the 
great apes or had lineage-specific accelerated 
regions in this lineage (Fig. 5C and tables S29 
and S35). 

Several positively selected genes and genes 
associated with lineage-specific accelerated 
regions in the great ape ancestor were also 
significantly overrepresented in the Hippo 
signaling pathway (P = 0.045, modified Fisher’s 
exact test) (table S36), which has been impli- 
cated in the determination of organ and body 
size (82). When combining all positively selected 
genes, genes associated with lineage-specific 
accelerated regions, and expanded gene fami- 
lies in the Simiiformes ancestral lineage, which 
markedly increased their body size compared 
with non-Simiiformes lineages (Fig. 4B), we also 
detected diverse candidate genes with adaptive 
changes in the Hippo signaling pathway. These 
results indicate potentially important roles for 
the Hippo pathway in body size changes in 
these two nodes during primate evolution. 


Evolution of the digestive system 


Primate lineages have evolved diverse dietary 
habits and specialized digestive functions 
(134). In particular, leaf-eating colobines, an 
African and Asian subfamily (Colobinae) of 
Old World monkeys, have evolved a uniquely 
specialized and compartmentalized foregut 
in which there are discrete alkaline and acidic 
sections to cope with their folivorous diet 
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and microbial fermentation can take place 
(135, 136). Although colobines eat leaves, fruits, 
flowers, and seeds, they typically focus much 
of their feeding time on leaves (estimated 
range: ~34 to 81% of their annual diet) (735). 
Accordingly, these leaf-eaters are well adapted 
in terms of meeting their energy metabolism 
requirements and balancing micronutrients 
and protein intake while also dealing with the 
toxins contained in their food plants (137). 

In the ancestor of the Colobinae, we identi- 
fied a number of pivotal digestive genes that un- 
derwent positive selection (table S37). Acyl-CoA 
dehydrogenase, encoded by the ACADM gene, 
is an important lipolytic enzyme that catalyzes 
the initial step in each cycle of mitochondrial 
fatty acid B-oxidation and plays a key role in 
metabolizing fatty acids derived from ingested 
foods (138). Energy-rich short-chain volatile 
fatty acids are produced by the microbial fer- 
mentation process and absorbed by the host, 
thus making an important contribution to the 
energy budget of colobines (135). Therefore, 
rapid evolution of this gene, with two posi- 
tively selected sites (V75M and A138C), may 
have been important for the absorption of fatty 
acids by colobines (Fig. 5D and fig. S33). NOX], 
which is highly expressed in the colon, was 
identified as being under positive selection in 
the ancestor of the Colobinae (Fig. 5D and 
tables S37 and S38). NOX7-dependent reactive 
oxygen species production can further regu- 
late microorganism homeostasis in the ileum 
of mice (139). The rumens of ruminants and 
the saccus stomachs of colobines have devel- 
oped a similar adaptive strategy to allow the 
microbial fermentation of high-fiber foods, 
and therefore are an example of convergent 
evolution. We found that MYBPCI, which has 
been shown to contribute to morphological 
and functional differences in the bovine ru- 
men (J40), also underwent positive selection 
in the ancestor of the Colobinae (Fig. 5D and 
table S37). In addition, 100 genes associated 
with lineage-specific accelerated regions were 
identified in the ancestral lineage of the 
Colobinae (table S39). Several of these genes 
were also highly expressed in the stomach, 
colon, pancreas, and small intestine (Fig. 5D 
and table S38). Of these, RNASE4 encodes a 
vital digestive enzyme, pancreatic ribonucle- 
ase 4, and is a paralog of RNASEI, which is 
known to have undergone adaptive evolution 
by gene duplication in leaf-eating colobines 
and howler monkeys (26, 141). Colobines may 
therefore have acquired adaptations to allow 
them to digest fatty acids and ribonucleic 
acids, and their unique foregut and intestinal 
microbiota enabled them to cope with their 
folivorous diet. 


Evolution of sensory organs 


In many mammals, olfaction is the dominant 
sense and provides much of the sensory infor- 


mation upon which animals rely to navigate, 
forage, and avoid predators or for social behav- 
ior and courtship (134). Most Strepsirrhini 
species are nocturnal, whereas most Simiiformes 
are diurnal with well-developed color vision 
systems attuned to their priorities in diurnal 
activity (142-145). By contrast, olfactory sen- 
sitivity appears to have decreased in the 
Simiiformes compared with the Strepsirrhini 
(134, 146, 147). Consistent with these findings, 
we found that the copy number of several 
specific olfactory receptor gene families was 
significantly reduced in the Simiiformes. For 
example, the olfactory receptor gene family 
OR52A underwent a significant contraction in 
the Simiiformes (40 species), with only ~0.7 
copies on average, in contrast to the ~3.4 av- 
erage copies in the Strepsirrhini (nine species) 
(figs. S34 and S35) (P = 4.072 x 10°°, Mann- 
Whitney U test). Anatomically, Strepsirrhini 
are characterized by the presence of a rhinar- 
ium, a moist and naked surface around the tip 
of the nose that is present in most mammals, 
including dogs and cats, but has been lost in 
the Simiiformes (134, 147). Olfactory bulb 
volume, which correlates with olfactory re- 
ceptor neuron population size, is also larger 
in the Strepsirrhini than in the Simiiformes 
(146, 148). The LHX2 gene, which partici- 
pates in olfactory bulb development (149, 150), 
experienced positive selection in the ances- 
tor of the Strepsirrhini (P = 0.03, y” test; 
table S40). 


Demographic history of nonhuman primates 


The IUCN lists more than one-third of pri- 
mates as critically endangered or vulnerable 
(1). To evaluate the effects of climate change 
and human activity on the recent population 
declines in these primates, we inferred their 
demographic histories over the past million 
years by using the pairwise sequentially Mar- 
kovian coalescent model (157) for each species 
in this study (fig. S36 and tables S16 and S41). 
Our data showed that most nonhuman primate 
species experienced rapid population declines 
during the late Pleistocene (Fig. 6A and fig. S37), 
consistent with the record of a large mass extinc- 
tion of mammals during this period (48, 152). 
Although we did not observe a significant 
difference between endangered species and 
other species in terms of nucleotide diversity 
(fig. S38 and table S42), we did detect a sig- 
nificant positive correlation between the me- 
dian effective population size (N.) over the 
past ~20,000 years and nucleotide diversity 
(P = 0.002, Pearson’s product-moment corre- 
lation after phylogenetic correction) (Fig. 6B 
and table S42), indicating a long-term effect 
of N. decline on the loss of genetic diversity. 
According to the historical demographic pat- 
terns, we further clustered all nonhuman pri- 
mate species with similar trends of historical 
N,, and found that 20 species experienced a 
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continual NV, decline over the past 3 million 
years (Fig. 6C). Sixty-five percent of these species 
are now listed as endangered or critically 
endangered (Fig. 6C and fig. S39). This ratio is 
twice that of the remaining species, suggesting 
that the prehistoric environmental effects (e.g., 
habitat fragmentation) (26) may also have 
driven population decline and contributed 
to the current endangered status of these 
species well before human interference in 
the modern era. 


Conclusions 


Understanding the evolution and genetic basis 
of human-specific traits requires a systematic 
comparison of genomes along the primate 
lineages. Previous studies of primate genomes 
have focused on genomic changes in the hu- 
man lineage that influenced brain functions 
and other traits (120, 153-155). Our comparative 
phylogenomic analyses across primate lineages 
have revealed some of the accumulated genomic 
changes at different primate ancestral nodes 
that may have contributed to the evolution of 
unique human traits. Of particular interest, we 
report a hitherto unreported increase in the 
rate of genomic change in the Simiiformes 
common ancestor that may have played a role 
in the later diversification of Simiiformes and 
the evolution of humans. Our comparative 
genomic analyses also yielded insights into the 
genetic basis of phenotypic diversity across 
primate lineages. With the rich diversity of 
morphology and physiology among nonhuman 
primates, further genomic analyses covering 
all primate species will provide an indispens- 
able resource for comparative studies allowing 
expansion of the scope of biomedical research 
programs using primates as model systems. 
Further, increased knowledge of the genomic 
makeup and variations of nonhuman primates 
should help to identify risk factors for genetic 
disorders and enhance wildlife health man- 
agement in both wild and captive members of 
these species. 
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INTRODUCTION: Incomplete lineage sorting 
generates gene trees that are incongruent with 
the species tree. Incomplete lineage sorting 
has been described in many phylogenetic clades, 
including birds, marsupials, and primates. For 
example, the level of incomplete lineage sort- 
ing in the human-chimp-gorilla branch adds up 
to ~30%, which means that, even though our 
closest primate relatives are chimps, 15% of 
our genome resembles more the gorilla than the 
chimp genome, and another 15% groups the 
chimp with the gorilla first. 


RATIONALE: Although incomplete lineage sort- 
ing is usually regarded as an obstacle for phy- 
logenetic reconstruction, it holds valuable 
information about the evolutionary history 
of the species because its extent depends on 
the ancestral effective population sizes and 
the time between speciation events. Addition- 
ally, recurrent ancestral selective processes 
are expected to influence how the proportion 
of incongruent trees varies along the genome, 
which makes incomplete lineage sorting a 
useful tool to study ancient evolutionary events. 
In this study, we estimate the incomplete lineage 
sorting landscape by running a coalescent 
hidden Markov model in species trios along a 
50-way primate genome alignment. We then 
leverage the signal of incomplete lineage sort- 
ing to reconstruct ancestral effective popula- 


Inference of the speciation history 
and the genomic landscape of 
natural selection in primates from 
patterns of incomplete lineage 
sorting. CoalHMM was used to capture 
the signal of incomplete lineage sorting 
(ILS) segments along the genomes 

of 50 primate species and to estimate 
coalescent parameters—i.e., the ancestral 
effective population sizes and speciation 
times. Moreover, the genome-wide 
variation in the levels of incomplete 
lineage sorting allowed for the inference 
of selective processes in primates. ChrX, 
X chromosome. 
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tion parameters and to analyze the genomic 
determinants that influence the sorting of 
lineages. 


RESULTS: We find widespread incomplete line- 
age sorting across the primate tree in 29 nodes, 
some reaching as much as 64% of the genome. 
Combining CoalHMM with a machine learning 
pipeline, we reconstruct the speciation times 
of the primate phylogeny without the need for 
fossil calibrations. Our speciation time estimates 
are more recent than divergence times, and 
they are in agreement with previous estimates 
based on fossil evidence. Our reconstructed 
ancestral effective population sizes show that 
they increase toward the past. 

We additionally detect regions that have 
low or high incomplete lineage sorting levels 
consistently across several nodes. We show 
that incomplete lineage sorting proportions 
increase with the recombination rate in the 
genomic region—a difference that translates 
into an up to fourfold variation in the inferred 
local effective population size. Moreover, we 
report low levels of incomplete lineage sorting 
on the X chromosome. This reduction is more 
pronounced than expected under neutral evo- 
lution, which suggests that selective forces 
affect the X chromosome more strongly than 
the autosomes, reducing the effective popu- 
lation size of the X chromosome and, sub- 
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sequently, the levels of incomplete line See 
sorting. 

We further assess how selection affects the 
distribution of incomplete lineage sorting pat- 
terns by comparing the incomplete lineage 
sorting proportions of exons with those in 
intergenic regions. We find that there is an 
overall decrease in the levels of incomplete 
lineage sorting in exons that amounts to a re- 
duction of 31% in the local effective popula- 
tion size as compared with intergenic regions. 

Finally, we perform a gene ontology enrich- 
ment analysis on low- and high-incomplete 
lineage sorting genes. We find that immune 
system genes show large proportions of in- 
complete lineage sorting for many of the nodes, 
whereas housekeeping genes with basic cell 
functions show a lack of incomplete lineage 
sorting. 


CONCLUSION: Most molecular-based methods 
that aim at timing a species tree provide es- 
timates of divergence times, which are con- 
founded by ancestral population sizes compared 
with the actual speciation times. We showed 
that using the coalescent theory and the sig- 
nal of incomplete lineage sorting allows us to 
accurately estimate speciation times and an- 
cestral population sizes in the primate tree, 
gaining key insights regarding some aspects 
of primate biology. Our study also empha- 
sizes the prevalence of natural selection at ‘ 
linked sites that shapes the landscape of both 
genetic diversity and incomplete lineage sort- 
ing along the primate genome. 
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Incomplete lineage sorting (ILS) causes the phylogeny of some parts of the genome to differ from 

the species tree. In this work, we investigate the frequencies and determinants of ILS in 29 major 
ancestral nodes across the entire primate phylogeny. We find up to 64% of the genome affected by ILS 
at individual nodes. We exploit ILS to reconstruct speciation times and ancestral population sizes. 
Estimated speciation times are much more recent than genomic divergence times and are in good 
agreement with the fossil record. We show extensive variation of ILS along the genome, mainly driven 
by recombination but also by the distance to genes, highlighting a major impact of selection on 
variation along the genome. In many nodes, ILS is reduced more on the X chromosome compared 
with autosomes than expected under neutrality, which suggests higher impacts of natural selection 

on the X chromosome. Finally, we show an excess of ILS in genes with immune functions and a deficit 
of ILS in housekeeping genes. The extensive ILS in primates discovered in this study provides 
insights into the speciation times, ancestral population sizes, and patterns of natural selection that 


shape primate evolution. 


omparative genomics can offer insights 
into population processes deep in phy- 
logenetic history. As a result of recom- 
bination, different parts of our genomes 
have different genealogical histories (1, 2). 
Therefore, when speciation occurs, the genes 
of the resulting descendants can be traced 
back to different ancestors, each coalescing 
at different times that stochastically depend 
on both the species population size and nat- 
ural selection acting on each gene. If the time 
between two consecutive speciation events is 
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short and/or the effective population size (Ne) 
is large, then genes from the two most closely 
related species may coalesce deeper in the past 
than the time of the oldest speciation event. 
This can result in genealogical histories that 
are different from the species tree—a phenome- 
non called incomplete lineage sorting (ILS). 
ILS has affected the evolutionary history of the 
human genome as well as many other groups 
(3-5). Around 30% of the human genome does 
not follow the ((human, chimpanzee), gorilla) 
speciation tree (2, 6-8), with 15% of nucleotide 
positions grouping human and gorilla, and 
15% grouping gorilla and chimpanzee. 

Although the phylogenetic incongruences 
produced by ILS can hamper gene tree recon- 
struction from single loci, they offer an oppor- 
tunity to learn about the population history 
of species sitting in deep ancestral branches 
of the phylogeny (6, 9-17). We can, for exam- 
ple, estimate the actual times when species 
split as opposed to the more ancient average 
time to the most recent common ancestor, and 
we can measure how natural selection, directly 
or indirectly, affected the genomic diversity of 
the ancestral species. For example, Dutheil et al. 
(72) have concluded that the lack of ILS on 
the X chromosome in the human-chimp an- 
cestor first reported by Patterson et al. (13) 
was likely a result of several episodes of very 
strong positive selection. 

The recent effort to de novo assemble a large 
number of primate genomes makes it possible 
to extend the study of ILS to many more nodes 
across the primate phylogeny, allowing esti- 
mation of the speciation times and the forces 
that shaped genetic diversity in the ancestral 
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species. With many independent replicates of 
the ILS process, we can learn about common 
targets of natural selection during primate di- 
versification. In this work, we apply an ex- 
tended version of the CoalHMM model (/4) to 
a whole-genome alignment of 50 primate spe- 
cies (10 prosimians, 7 New World monkeys, 
23 Old World monkeys, and 10 great and 
lesser apes). We report high levels of ILS on 
29 of the total number of internal branches, 
and we estimate dates of the speciation times 
independently of fossil calibration that are in 
concordance with available fossil evidence. 
Additionally, we report recombination rate, 
ancestral effective population sizes, and se- 
lection as major genomic and functional de- 
terminants that have shaped the patterns of 
ancestral primate diversity. 


Results 
ILS is pervasive on most branches 
of the primate tree 


We applied CoalHMM to the internal branches 
of the primate tree for 50 species used in Shao et al. 
(15) and shown in Fig. 1A, using combinations 
of quartets of species from the genome-wide 
alignment (see the supplementary materials, 
section 4). After filtering out ambiguously 
aligned regions, we used posterior decoding to 
infer segments of the alignment best supported 
by either the species topology or any of the two 
possible discordant topologies. Figure 1A shows 
the level of autosomal ILS detected on indi- 
vidual branches of the phylogeny. Branch lengths 
represent estimated genomic divergence times 
obtained by dividing substitution rates of the 
ExaML Gamma model by an estimate of the 
yearly mutation rate of each branch (supple- 
mentary materials, sections 3 and 7). We found 
appreciable genome-wide ILS proportions be- 
tween 5 and 64% on 29 of the 49 branches, 
which implies that, on these branches, a large 
proportion of the genome follows a different 
gene genealogy from that of the species tree 
(Fig. 1A). The length distribution of the ge- 
nome segments supporting the discordant 
topologies (i.e., topologies V2 and V3 in Fig. 1A, 
inset) depends mainly on the effective popu- 
lation size of the examined branch and is 
expected to follow a geometric distribution. 
Except for a deficiency of very short segments, 
this assumption is generally met in our anal- 
ysis (fig. S7). We also show that the mean 
length of segments supporting both the spe- 
cies topology and the discordant topologies 
varies substantially among nodes, with mean 
lengths for discordant segments between 100 
and 1000 base pairs for individual branches 
(fig. S7). This shows that single genes, which 
typically cover >20 kb in the genome, rarely 
have just one phylogenetic history when ILS is 
prominent. 

A previous study based on the phylogenies 
of 1700 genes concluded that hybridization 
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Fig. 1. Phylogenetic tree of primates, with scaled divergence times or 
speciation times as branch lengths. (A) Divergence time tree scaled with 
estimated mutation rate for individual branches (supplementary materials, 
section 7). Percentage ILS (the sum of V2 and V3 topologies; see inset) is 
plotted as branch color and marked with numbers for those branches with >5% 
of ILS. Only the subset of 38 species that were used to infer the ILS of the 
colored branches are plotted for clarity. The two columns on top of each branch 
show the relative frequency of bases attributed to V2 and V3, respectively. 

The numbers in red denote individual branches referenced in subsequent figures. 
The taxonomic classification is shown to the right of the phylogeny. (B) Speciation 
time tree with branch lengths in units of million years (MYA) as estimated 
from CoalHMM and scaled with estimated ancestral mutation rates for individual 
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branches (supplementary materials, section 7). The annotations in colored 
rectangles refer to the inferred ancestral effective population sizes. Branches without 
enough information to infer speciation times using CoalHMM (i.e., branches with 
<5% ILS) are shown as dashed lines. Here, speciation times are instead estimated 
by subtracting an assumed population size (the CoalHMM estimate of the 
ancestral population size of the closest branch) from the divergence time 
rescaled by mutation rate per generation (supplementary materials, section 10). 
The inset panel shows the correlation between the split times estimated by 
CoalHMM and the dated fossil record. Each point corresponds to an evolutionary 
node in the right panel. Horizontal lines correspond to the bootstrapped standard 
deviation of the estimated branch length, and vertical lines represent the 
standard deviation of the fossil date estimates (data are shown in table S4). 
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events are as common in the deeper branches 
of the primate tree as they are today between 
related extant species of many primate groups 
(6, 17). To estimate to what extent the phylo- 
genetic incongruence that the model attributes 
to ILS is affected by widespread hybridization 
on the deeper branches, we investigated the 
relative frequency, nucleotide divergence, and 
length of the genomic fragments assigned to 
the two discordant topologies on each inter- 
nal branch. If explained by ILS, the three mea- 
sures should all be equal for the two discordant 
topologies, whereas hybridization is expected 
to cause one of the discordant topologies to 
be more frequent, and the genomic segments 
supporting the predominant topology should 
be, on average, longer and less divergent than 
those supporting the other discordant topol- 
ogy. On most of the 29 internal branches, we 
observe near-equal proportions of genomic 
positions assigned to the two discordant to- 
pologies (see the proportions of V2 versus 
V3 in Fig. 1A), and we find that the fragments 
have very similar size distributions (fig. S7). 
After correcting for different substitution rates 
(supplementary materials, section 9), we also 
find that segments with the two discordant 
topologies are close to equally divergent (figs. 
S17 and S19). Exceptions to these general pat- 
terns are found within the recent macaque, 
gibbon, and lesser apes divergences. In these 
cases, evidence of introgression has also been 
reported previously (16, 18, 19). However, even 
in those cases, ILS is the predominant cause 
of incongruent genealogies in the primate 
tree (20) (supplementary materials, section 9). 
It is possible that hybridizations occurred be- 
tween related species in deeper branches, as 
is observed in several extant genera. How- 
ever, if a pair of hybridizing species did not 
both leave extant descendant species (as is 
likely because most species die out), this 
would not have been distinguishable from 
deep coalescences in causing ILS. Thus, we 
cannot completely exclude that gene flow 
occurred at ancestral branches—only that it 
did not leave detectable evidence of ancient 
hybridization. 

The level of ILS generally increases with 
shorter internal branch lengths (Fig. 1A). In 
the taxon sampling of our present dataset, we 
find that ILS is particularly ubiquitous in Old 
World monkeys, which have undergone rapid 
speciation events. Notably, however, 32% ILS 
is estimated even on a very long and deep 
branch within Strepsirrhini and 57% on the 
branch separating tarsiers from Strepsir- 
rhini (branch 1 and branch 28, respectively; 
Fig. 1A), which suggests very large ancestral 
population sizes in these nodes that can also 
be predicted from the short size of the ILS 
fragments (fig. S7). Furthermore, the very 
high levels of ILS in gibbons and Old World 
monkeys, particularly macaques and baboons, 
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explain the long-standing difficulty to resolve 
their phylogenetic relationships (6, 19, 21, 22). 


Speciation times and ancestral effective 
population sizes in the primate tree 


The reconstruction of the dated history of a 
group of species is typically based on genomic 
divergence rates turned in divergence times 
through fossil calibrations (23-25). However, 
the genomic divergence times in species with 
large populations and long generation times 
can be much further back in time than the 
time when species actually split. The expected 
time for genomic coalescence on an ancestral 
branch is 2 x Ne generations older than the 
times of speciation. For an ancient popula- 
tion with an Ne of 200,000 and a generation 
time of 10 years, the average expected genomic 
divergence time would be 4 million years fur- 
ther back in time than the actual species split 
time. The analysis of incongruences produced 
by ILS via CoalHMM allows direct estimation 
of speciation times as opposed to divergence 
times as well as estimation of the ancestral 
effective population sizes. 

We used the estimated parameters using 
CoalHMM together with simulations and a 
random forest model to derive ancestral ef- 
fective population sizes and speciation times 
in all nodes with >5% of ILS (supplementary 
materials, section 10, and fig. S20). We then 
rescaled the parameters by estimated yearly 
mutation rates, which we derived from the 
relationship between pedigree-based yearly 
mutation rate and generation time, and the 
relationship between inferred body mass of 
extant and ancestral species and generation 
time (26, 27) (supplementary materials, sec- 
tion 7). The resulting tree (fig. S21) was close 
to ultrametric and was linearized to make the 
speciation time tree shown in Fig. 1B (and 
that in fig. $22). 

We infer ancestral effective population sizes 
that vary more than an order of magnitude 
within the primate phylogeny. In the few 
cases where ancestral effective population sizes 
of primate lineages have been estimated by 
other approaches, they are in good agreement 
with our estimates (27, 28-30). For instance, 
Warren et al. (29) have estimated effective pop- 
ulation sizes in the ancestors of the Chlorocebus 
lineage at around 40,000 using a multiple se- 
quentially Markovian coalescent (MSMC) ap- 
proach, when we infer an ancestral population 
size of 58,000, and Schrago and Seuanez (27) 
have estimated Ne in the ancestors of Aotus 
and Callitrichinae to >240,000 using a MSMC 
approach, when we infer an ancestral popula- 
tion size of 330,000. Most estimated ancestral 
Ne values are higher than effective population 
sizes estimated for primates today. This might 
reflect the fact that the ancestors of primates 
had smaller body sizes (37), which is known to 
be associated with larger population sizes, or 
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that lineages with small population sizes are 
more likely to go extinct, leaving no descen- 
dants to sample from (32). As expected, popu- 
lation size estimates are negatively correlated 
with the median segment size of the discor- 
dant topologies (fig. S24). We also find an 
expected negative correlation between our 
estimate of ancestral Ne and the efficiency of 
purifying selection measured as dN/dS (the 
ratio of nonsynonymous to synonymous mu- 
tations) on the ancestral branches (fig. S25; 
P = 0.0015) and an expected negative correla- 
tion between average segment length and 
dN/dS (fig. S26). 

Our inferred species split times are gener- 
ally in good agreement with independent esti- 
mates from the fossil record when these exist 
(Fig. 1B, inset, and table S4), which supports 
that our approach can also infer speciation 
times on nodes that lack fossil evidence with- 
out the need for fossil calibration. Previous 
studies extrapolating the speciation time on 
the basis of pedigree-based mutation rates 
back in time have generally led to estimated 
times much further back in time than those 
suggested by the fossil record (6, 33, 34). We 
see two reasons for this. First, the large ef- 
fective population sizes imply that divergence- 
based estimates of split times are several million 
years further back in time than the actual spe- 
cies split times. Second, our analysis rescales 
branch lengths by yearly mutation rates de- 
pendent on body size and generation time. 


Highly variable frequency of ILS along 
the genome 


Under selective neutrality, ILS is expected to 
occur at random along the genome. However, 
if natural selection, either directly or indirectly, 
affects the coalescent process of a genomic re- 
gion, the sorting of lineages with deep coales- 
cence will not be random (12, 35, 36). We 
painted all the genomes of the 29 ancestral 
branches by the level of ILS in 100-kb win- 
dows displayed as horizon plots (37, 38) (fig. 
S8) and found many regions that experienced 
either high or low levels of ILS in the same 
genomic positions across several ancestral 
nodes in the primate phylogeny. We there- 
fore integrated the ILS inference across the 
29 branches using normalized ILS scores dis- 
played in a single horizon plot showing the 
general pattern of ILS with the human ge- 
nome coordinates as reference (Fig. 2A). This 
integrated signal of ILS shows that certain 
regions have consistently high or low levels of 
ILS. As an example, ILS is reduced in a large 
genomic region from 40 to 60 Mb on chromo- 
some 3 (chr3) (Fig. 2B), which suggests either 
repeated selective sweeps or strong background 
selection (11, 36). By contrast, the human lym- 
phocyte antigen-major histocompatibility com- 
plex (HLA-MHC) cluster on position 27 to 
33 Mb on chré6 has several regions showing 
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Fig. 2. Genome-wide distribution of ILS levels. (A) Horizon plot of the mean z-standardized ILS values in 100-kb windows (x coordinates in megabases). Red colors 
represent regions low in ILS, and blue colors represent high-ILS regions. Missing data are represented by a horizontal line. Regions marked with a rectangle in 

(A) are zoomed in. (B to D) A low-ILS region in chr3 (B), the MHC in chr6 (C), and the PAR region of the X chromosome (D). (B) to (D) are all horizon plots for all of 
the 29 individual nodes, where each node is mapped to Fig. 1A, inset. Mbp, mega-—base pairs. 


extremely high ILS, likely as a result of bal- 
ancing selection (Fig. 2C). Additionally, the 
pseudoautosomal region (PAR) on position 
O to 2.7 Mb on the X chromosome also con- 
tains much higher ILS than the rest of the 
X chromosome (Fig. 2D) and, in many nodes, 
much higher ILS than the autosomal average. 
These and many other consistent patterns 
suggest that there are genomic and/or func- 
tional determinants of ILS that persist across 
the primate phylogeny. 


Determinants of the variation in ILS along 
the genome 


Recombination is not expected to directly af- 
fect the amount of ILS but can do so indirectly 


Rivas-Gonzalez et al., Science 380, eabn44.09 (2023) 


because the amount of recombination deter- 
mines the efficacy of both positive and nega- 
tive selection and, thus, the amount of diversity 
that is lost because of selection at linked ge- 
nomic positions. A general observation of a 
positive correlation between nucleotide diver- 
sity and the recombination rate in extant spe- 
cies, including humans, has been interpreted 
as evidence for both the action of linked se- 
lection and as a mutagenic effect of recom- 
bination (39-41). ILS patterns will not be 
affected by the latter, so we investigated how 
ILS depends on recombination rate by extrap- 
olating the human pedigree-based recombi- 
nation map (42) at a 100-kb scale to the whole 
primate phylogeny. We inferred ILS levels and 
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the corresponding relative local Ne as a func- 
tion of recombination rate divided into ten 
bins (fig. S15 and supplementary materials, 
section 8). We find that the Ne of genomic 
regions with the highest recombination rate is 
typically 1.3-fold to fourfold larger than that in 
the lowest recombination bin (Fig. 3A), which 
implies that linked selection has removed a 
large proportion of the diversity in the ances- 
tral species. Additionally, the extent of the ef- 
fect of linked selection on genetic variation that 
we observe is likely underestimated because 
the present-day human recombination map 
is an imperfect proxy of the recombination 
landscape in ancestral species separated by 
tens of million years from humans. 
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Fig. 3. Determinants of variation in ILS and corresponding Ne. (A) Differ- 
ence in the proportion of ILS between the lowest recombination and the highest 
recombination deciles against the proportion of ILS in the highest recombination 
decile. Each numbered point represents a node in the phylogeny mapped to 
Fig. 1A, inset. The color and lines represent the relative change in Ne between the 
low and high recombination deciles, calculated using eq. S3 in the supplementary 
materials, section 8. (B and C) Comparison of the mean z-standardized 
proportion of ILS across 29 branches and the human (green), chimp (purple), 
and baboon (orange) recombination maps in the telomeres (B) and in chr2 (C). 
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In (C), the fusion point is represented by a vertical line. (D) Difference between 
the ILS proportion of chromosome X and autosomes, where each numbered point is 
a node in the phylogeny mapped to Fig. 1A, inset. The color and lines represent 
the relative change in Ne between chromosome X and autosomes, calculated using 
eq. S3 in the supplementary materials, section 8. (E) Difference between the 
proportion of ILS in either exons (green) or introns (blue) and intergenic regions 
(red). Each numbered point and the corresponding vertical line represent one of 
29 nodes in the phylogeny, mapped to Fig. 1A, inset. The colored lines represent 
fitted models that translate into a constant reduction in Ne across nodes. 


Telomeres recombine more frequently than 
the rest of the genome (42-45). The integrated 
signal across all nodes and autosomal telo- 
meres (Fig. 3B) shows a peak in the telomeric 
ILS that agrees with human (42), chimpanzee 
(45), and olive baboon (46) recombination maps 
at the tips of the chromosomes. Moreover, there 
is an increased signal of ILS at around posi- 
tion 114 Mb of chr2 (in human coordinates) 
(Fig. 3C), which corresponds to the remnants 
of an ancient telomere-telomere fusion affect- 
ing only the human lineage (47). Notably, we 
can only detect the corresponding peak in 
recombination in this region using the re- 
combination map of nonhuman primate 
species, which suggests that, although big 
chromosomal rearrangements might mark- 
edly change the present-day recombination 
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patterns, ILS can still be used to infer the 
ancestral recombination landscape in the 
primate phylogeny (48). 

We next contrasted the ILS on the X chro- 
mosome with that on the autosomes. Because 
males only carry a single copy of the X chro- 
mosome in primates, and, consequently, it has 
a smaller effective population size, the X chro- 
mosome is expected to have lower ILS. We 
find that the X chromosome has an overall 
lower amount of ILS compared with the auto- 
somal average (Fig. 3D and fig. S6), with the 
decrease corresponding to the Ne of chromo- 
some X being between 50 and 75% of that of 
the autosomes. Under random mating and un- 
biased sex ratio, the Ney/Ne, ratio is expected 
to equal 75% (49). However, in primates, males 
typically have the highest variance of repro- 
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ductive success (50), which is at odds with our 
observed ratios smaller than 0.75 (Fig. 3D). 
Previous surveys of chromosome X to auto- 
some diversity have also often reported ratios 
below 0.75—e.g., 0.6 in non-African humans 
(51), 0.4 in gorillas, 0.5 in orangutans (52), 
and 0.3 in macaques (53). These observations 
have often been ascribed to differences in male 
and female mutation rates and recent bottle- 
neck effects affecting the X chromosome di- 
versity more than the autosomal diversity (54). 
However, sex differences in mutation rates 
should not affect ILS inference, and bottle- 
necks are unlikely as a general explanation 
throughout the primate phylogeny. We thus 
conclude that the large reduction in ILS on 
the X chromosome is likely a result of linked 
selection targeting the X chromosome to a 
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larger extent than the autosomes, as has been 
reported previously in the human-chimpanzee 
ancestral species (12). The 1.5- to 2.7-Mb PAR 
of the X chromosome is very high in ILS in most 
ancestral species (fig. S8). This is consistent with 
its very high recombination rate in males— 
~22 times the genome average rate, which 
minimizes the effect of linked selection—and 
its high polymorphism in great apes (55). 

The strong positive correlation between ILS 
and recombination (fig. S15) suggests that 
positive and negative selection events had a 
strong impact on the removal of diversity in 
the ancestral species. These selective events 
are more likely enriched in genes, so we con- 
trasted the amount of ILS in coding regions, 
introns, and intergenic regions (Fig. 3E). We 
find that, for all internal branches, ILS,,o, < 
TLSintron < ILSintergenic We estimate that a 
constant average reduction in the Ne of exons 
of 31% compared with the Ne in intergenic 
regions across the primate nodes would amount 
to the observed decrease in exonic ILS (P < 2 x 
10'°; SD = 14). Additionally, introns have an 
estimated average reduction in Ne of 10% 
compared with intergenic regions (P < 2 x 
10"°; SD = 0.5), which we interpret as a direct 
effect of their closer physical proximity to exons, 
leaving intronic ILS more strongly affected by 
linked selection than intergenic ILS. 


ILS and gene function 


Finally, we investigated whether certain gene 
categories are more likely to experience high 
levels of ILS than others—either because they 
experience less purifying selection and adaptive 
evolution or because they are more likely to 
be under balancing selection. We performed 
gene ontology enrichment tests with ILS as 
the response variable (supplementary mate- 
rials, section 12). 

We identify the most significant gene on- 
tology terms enriched for either high or low 
ILS genes across the primate nodes and plot 
the gene ontology terms as a function of their 
average dN/dS ratio (Fig. 4A). As expected, 
more selectively constrained gene categories 
have significantly lower ILS than the genic 
average (correlation coefficient, 7 = 0.35; P = 
2.68 x 10°"°). These include many house-keeping 
gene categories and genes categories associated 
with chromosome organization and regulation. 
The PIAS3 gene involved in transcriptional 
modulation is an example of consistently low 
ILS (Fig. 4B, left; other examples are in fig. S27). 

Notably, the two gene ontologies with the 
highest ILS are “cornification” and/or “Ke- 
ratinization” and “immune response regula- 
tion.” Corfinication (enriched for high ILS in 
12 nodes) and keratinization (enriched for 
high ILS in 17 nodes) are tightly related gene 
ontology terms that include epidermal and 
keratinization genes. Primates exhibit an ex- 
traordinary degree of color variation across 
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Fig. 4. ILS and gene function. (A) relationship between the ILS and the median dN/dS for each gene ontology 
term. Each data point corresponds to one gene ontology term, where the median dN/dS across all 29 nodes is 
plotted on the y axis, and the mean z-standardized ILS across all 29 nodes is represented on the x axis. Blue points are 
gene ontology terms that are significantly enriched for high ILS in at least one node, and red points correspond to 
gene ontology terms significantly enriched for low ILS. The size of the data points represents the number of nodes for 
which that gene ontology term has been significantly detected in the enrichment test. (B) Examples of genes 

with consistently low ILS (PIAS3, left) or consistently high ILS (CDIA, right). Each row corresponds to the inferred 
topologies (VO or V1 in blue, and V2 or V3 in red) per genomic position for each node in the primate phylogeny. 
The top gray bar represents exons (in thick lines) and introns (in thin lines). 


and within species and even in different parts 
of the body (56, 57), which highlights the 
importance of the phenotypic evolution of skin 
in primates. This high diversity of coloration is 
crucial as social and sexual signaling and is 
often under stabilizing selection or positive 
selection that is closely linked with the high 
variations of primates in ecological niches, 
color vision, mating, and social systems (58). 
Additionally, some of the keratin gene fam- 
ilies exhibit high levels of gene duplication 
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and high functional diversification in primates 
(59-61). 

Immune response regulation genes have 
been reported to evolve under balancing se- 
lection in primates (62), consistent with their 
enrichment in high-ILS genes. The MHC in 
chr6 is an outstanding region enriched for ILS, 
especially in Old World monkeys. Many other 
genes related to the immune response in ge- 
nomic locations other than the HLA are also 
high in ILS. The detailed ILS pattern for the 
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CDIA gene (chr1) involved with innate immune 
response (Fig. 4B, right; other examples in fig. 
$28) reveals a higher ILS proportion in this gene 
above the average across the 29 nodes. Other 
examples are the ULBP family and killer cell 
immunoglobulin-like receptor (KIR) proteins. 
This last family is highly diverse, and it is 
consistent with patterns of balancing selec- 
tion in several present-day human populations 
(63, 64) and other primates (65). 


Conclusion 


The inference of ILS on many nodes in the 
primate phylogeny allows us to estimate spe- 
ciation times and ancestral population sizes 
directly from genomic divergence data. We 
found that the effective population sizes have 
been very large in early primate evolution, at 
least in most lineages that have descendants 
today. This explains why the genomic diver- 
gence times estimates are much further back 
in time than the actual speciation times and 
why estimates of speciation events from trio- 
based germline mutation rates are often fur- 
ther back in time than the dating with fossil 
records. 

The high levels of ILS in most nodes of the 
primate phylogeny made it possible to inves- 
tigate the forces that shape genetic diversity 
along the genome in a complementary way to 
what has been done extensively using genome 
diversity data for individual species. We find 
that ILS depends strongly on the recombina- 
tion rate, likely illustrating that a large part of 
genetic diversity is being removed by selection 
at linked sites. This dependency may partly 
explain Lewontin’s paradox that the difference 
in genetic diversity across species is smaller 
than predicted from differences in neutral 
effective population sizes (66, 67). The preva- 
lence of natural selection at linked sites in- 
fluencing diversity in ancestral nodes and thus 
ILS is also clear from the reduced ILS in in- 
trons compared with intergenic regions. The X 
chromosome appears to undergo more nat- 
ural selection than the autosomes, perhaps 
as a consequence of male hemizygosity or 
possibly its strong role in male reproduction. 
Finally, ILS patterns also illuminate gene cat- 
egories under balancing selection, particularly 
related to cornification or keratinization and 
immune functions, often experiencing differ- 
ent genealogical history compared with the 
speciation process. 


Materials and methods summary 
Data, alignment, and species tree 


Our dataset consists of 50 primate species, in- 
cluding 27 newly sequenced ones and an out- 
group, Galeopterus variegatus. For detailed 
information on sequencing and assembling, 
see the accompanying paper (J5). We gen- 
erated pairwise genome alignments using 
LASTZ (v1.04.00) for each species versus the 
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human genome then using MULTIZ (v11.2) 
for multiway alignments. After removing col- 
umns of the alignment containing gaps in any 
of the species, we randomly chose half of the 
columns to run ExaML with the GAMMA model 
with 100 bootstraps. We report the tree with 
the highest maximum likelihood (fig. S1). 


CoalHMM 


We designed a divide-and-conquer, automated 
CoalHMM pipeline to fit a hidden Markov 
model where hidden states are four different 
topologies (Fig. 1A, inset), namely the species 
tree topology (VO), the deep coalescent to- 
pology following the species tree (V1), or one 
of two alternative topologies incongruent with 
the species tree (V2 and V3) (1/4). We defined 
each branch with a quartet of genomes and 
extracted them from the 51l-way alignment 
using MafFilter (68). We removed columns 
containing only gaps and merged consecu- 
tive blocks that were <200 nucleotides apart. 
Chunks of <2000 nucleotides were filtered 
out, and blocks were divided into groups con- 
taining roughly 1 Mb alignment each (fig. S2B). 
CoalHMM was first run in a subset of 1-Mb 
groups of blocks, and the means of each of the 
estimated population parameters (taul, tau2, 
thetal, theta2, c2, rho, and all the GTR model 
values) were recovered and used as starting 
parameters for the second CoalHMM run on 
all the other 1-Mb groups of blocks (fig. S2D). 
The posterior probabilities for each of the four 
hidden states were collected for each 1-Mb run 
and mapped to human coordinates (fig. S2E). 
All the code for processing the files and run- 
ning CoalHMM is unified using a gwf workflow 
(https://gwf.app/), which can be accessed via 
https://github.com/rivasiker/autocoalhmm. 


Genomic determinants of ILS 


We used the latest deCODE human recombina- 
tion map from Halldorsson et al. (42), the chim- 
panzee recombination map from Auton et al. 
(45), and the olive baboon recombination map 
from Sgrensen et al. (46) to divide the genome 
into 10 equally sized recombination bins at a 
100-kb resolution. We then calculated the 
mean ILS for each bin. 

We retrieved intron and exon information 
from the knownGene UCSC Genome Browser 
table for hg38 (69-71) and kept only protein- 
coding genes that appear in the knownCanonical 
UCSC Genome Browser table (72). After trim- 
ming for size (supplementary materials, sec- 
tion 8), ILS level was calculated for exons and 
introns separately. 


Introgression 


We compared the level of divergence between 
sister species for segments of the genome at- 
tributed with the four different topologies to 
assess whether the level of incongruences that 
we report could be influenced by introgression. 
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We used MafFilter to extract, filter, and con- 
catenate segments and computed the percen- 
tage of mismatch as a measure of divergence 
between the sister species of the alignment (i.e., 
species 1 and species 2 for the segments as- 
signed to the states VO or V1, species 1 and spe- 
cies 3 for the segments assigned to the states 
V2, and species 2 and species 3 for the seg- 
ments assigned to the states V3). We performed 
nonparametric Tukey-Kramer tests to compare 
the distribution of VO segments versus V2 ver- 
sus V3 segments divergence. 


Population parameters reconstruction 


The population parameters taul, tau2, thetal, 
theta2, c2, and rho (defined as in fig. S20) 
outputted by CoalHMM are biased because of 
the use of a restricted set of four possible to- 
pologies to model the continuity of possible 
coalescence times (14), so we developed a 
machine learning-based procedure to learn 
how the different combinations of parameter 
values influence the bias of each parameter 
and then used this knowledge to predict the 
bias on real data. Briefly (but see supplemen- 
tary materials, section 10), we ran CoalHMM 
on alignment blocks simulated under a grid of 
known combinations of population parameters 
using msprime (73). We then used the sim- 
ulated versus estimated population parameters 
to train a random forest model and estimated 
the bias in our data on the basis of the estimates 
outputted by CoalHIMM on the primate dataset. 


dN/dS 


We recovered 9972 coding gene alignments 
and filtered for orthologous genes where at 
least 41 out of the 50 primate species and the 
outgroup were present. Protein alignments 
were aligned using PRANK (74) and then fil- 
tered by Gblocks (75). Nucleotide alignments 
were generated by applying the protein align- 
ment and site selection to the corresponding 
nucleotide sequences. We estimated branch- 
specific dN/dS ratios using the branch model 
of Codeml from PAML 4 (23). Results are re- 
ported in table S5. 


Gene ontology 


A gene ontology (GO) enrichment test was car- 
ried out for both high-ILS and low-ILS genes in 
each node using GOATOOLS (76). Gene anno- 
tations were downloaded from the National 
Center for Biotechnology Information’s file 
transfer protocol (FTP) server (ftp://ftp.ncbi. 
nlm.nih.gov/gene/DATA/gene2g0.gz). For each 
branch, genes were assigned to be high in ILS 
if their exonic ILS was in the top 30%, whereas 
genes were classified as low ILS if they were in 
the bottom 30%. The significance level for the 
enrichment test was set to 0.05 after false dis- 
covery rate correction. A full list of the en- 
riched gene ontology terms can be found in 
table S6. 
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INTRODUCTION: Hybridization is increasing- 
ly recognized as an important evolutionary 
force for generating species and phenotyp- 
ic diversity in plants and animals. This is 
especially common in lineages that can to- 
lerate whole-genome duplication and in- 
creased levels of ploidy. However, the role 
of hybridization in generating species and 
phenotypic diversity of lineages without 
polyploidization is underappreciated, es- 
pecially in nonhominoid mammals. 


Parent A 


Yellow hair 


Hybridization 


RATIONALE: The snub-nosed monkey genus 
Rhinopithecus comprises five allopatric and 
morphologically differentiated species, the 
black-white snub-nosed monkey Rhinopithecus 
bieti, the black snub-nosed monkey Rhino- 
pithecus strykeri, the golden snub-nosed mon- 
key Rhinopithecus roxellana, the gray snub-nosed 
monkey Rhinopithecus brelichi, and the Tonkin 
snub-nosed monkey Rhinopithecus avunculus. 
They possess the same chromosome number, 
and it has been speculated that they have hy- 
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Functional assays 


Functional assays 


The hybrid origin and genetic basis of mosaic coat coloration for the gray snub-nosed monkey. 
Interspecific hybridization between the golden snub-nosed monkey and the ancestor of black-white/black 
snub-nosed monkeys led to the genomic admixture of the gray snub-nosed monkey. Alleles of positively 
selected genes related to melanogenesis were alternately inherited from parental lineages A and B 

and contributed to the mosaic coat coloration of the hybrid species. 
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bridized in the past. To examine the sp¢ Chec 


tion histories of these species, we genere--= 
a chromosome-level high-quality reference 
genome assembly for the black-white snub- 
nosed monkey and analyzed 106 resequenced 
genomes of individuals from all five species. 
We conducted multiple population genomic 
analyses—including ADMIXTURE, D-statistics, 
phylogenetic reconstruction, and evolution- 
ary scenario simulations—to investigate the 
genomic admixture of these species. We fur- 
ther applied genomic selective scans and func- 
tional assays to reveal the likely genetic basis 
of mosaic coat coloration of the hybrid spe- 
cies. Possible mechanisms of premating and 
postmating reproductive isolation barriers 
between the hybrid species and its parents 
are briefly discussed. 


RESULTS: We show that historical hybridiza- 
tion directly resulted in the origin of the gray 
snub-nosed monkey. Population genomic analy- 
ses provided evidence for apparent genomic 
admixture across genomes of all gray snub- 
nosed monkeys from two parental lineages, 
the golden snub-nosed monkey and an an- 
cestor of the black-white/black snub-nosed 
monkeys, with the majority of genome derived 
from the golden snub-nosed monkey. As a re- 
sult of hybridization, the hybrid species pos- 
sesses a mosaic of the color patterns of its 
parents. Genomic selection scans and func- 
tional assays identify several key melanogenesis- 
related genes (PAH, APC, SLC45A2, MYO7A, 
and ELOVIA). Alleles of these genes were al- 
ternately inherited from each parent, likely 
producing the mosaic coat coloration of the 
hybrid monkey and promoting premating 
reproductive isolation of the hybrid species 
from both parents. In addition, alternate in- 
heritance of divergent alleles at many loci, 
especially those involved in genetic incom- 
patibility between the parents, may have con- 
tributed to postmating reproductive isolation 
of the gray snub-nosed monkey. 


C4 


r 
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CONCLUSION: We report a notable example of 


hybrid speciation in primates and present a 
detailed evolutionary scenario from the ge- 
nomic admixture to the likely reproductive 
isolation establishment owing to alternate 
inheritance of divergent alleles from parents. 
This study highlights the underappreciated role 
of interspecific hybridization in species and 
phenotypic diversity in mammals. 
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Hybridization is widely recognized as promoting both species and phenotypic diversity. However, its 
role in mammalian evolution is rarely examined. We report historical hybridization among a group of 
snub-nosed monkeys (Rhinopithecus) that resulted in the origin of a hybrid species. The geographically 
isolated gray snub-nosed monkey Rhinopithecus brelichi shows a stable mixed genomic ancestry 
derived from the golden snub-nosed monkey (Rhinopithecus roxellana) and the ancestor of black-white 
(Rhinopithecus bieti) and black snub-nosed monkeys (Rhinopithecus strykeri). We further identified 

key genes derived from the parental lineages, respectively, that may have contributed to the mosaic coat 
coloration of R. brelichi, which likely promoted premating reproductive isolation of the hybrid from 
parental lineages. Our study highlights the underappreciated role of hybridization in generating species 


and phenotypic diversity in mammals. 


ybridization is increasingly recognized 

as an important evolutionary force for 

generating species and phenotypic di- 

versity in plants and animals (1-4). This 

is especially common in lineages that 
can tolerate whole-genome duplication and 
increased levels of ploidy (5). However, the 
role of hybridization in generating species 
and phenotypic diversity of lineages without 
polyploidization is underappreciated, espe- 
cially in nonhominoid mammals. In this study, 
we report historical hybridization among a 
group of snub-nosed monkeys (Rhinopithe- 
cus) that resulted in the origin of a hybrid 
species. 

The genus Rhinopithecus includes five al- 
lopatric and morphologically differentiated 
species (the black-white snub-nosed monkey 
Rhinopithecus bieti, the black snub-nosed mon- 
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key Rhinopithecus strykeri, the golden snub- 
nosed monkey Rhinopithecus roxellana, the 
gray snub-nosed monkey Rhinopithecus brelichi, 
and the Tonkin snub-nosed monkey Rhino- 
pithecus avunculus) (Fig. 1). They possess the 
same chromosome number, and it has been 
speculated that they have hybridized in the 
past (6-8). To examine the speciation histories 
of these species, we generated a chromosome- 
level high-quality reference genome assembly 
(270-fold coverage) for the black-white snub- 


100°E 


Black-white 
(R. bieti) 


> 
l 


Blac 
wa (R. stryker') 


ad \ 


100°E 


nosed monkey and analyzed 106 resequenced 
genomes of individuals from all five species 
(12.12-fold coverage on average) (fig. SI and 
tables S1 to S3). Our comprehensive analyses 
of these data supported the hybrid origin of 
the gray snub-nosed monkey and identified 
several key genes that may have contributed 
to the mosaic coat coloration and premating 
reproductive isolation of this hybrid species. 


Apparent genetic mixture across genome 
of the gray snub-nosed monkey 


Analyses by means of ADMIXTURE set at two 
postulated ancestral populations (K = 2) showed 
that all individuals of the gray snub-nosed mon- 
key possessed an admixed genome derived 
from two parental lineages: the golden snub- 
nosed monkey (parent A) and an ancestral 
lineage of the black-white/black snub-nosed 
monkeys (parent B), with the majority of 
genome derived from the golden snub-nosed 
monkey (69 and 61.75% of autosomal and X 
chromosomal components, respectively) (Fig. 
2A and fig. S2). Principal components analy- 
ses (PCA) supported this finding, placing the 
gray snub-nosed monkey in an intermediate 
position between its two presumed parental 
lineages (Fig. 2B). Furthermore, both analyses 
that were conducted on down-sampled datasets, 
to avoid unbalanced sampling effects, yielded 
similar results (figs. S3 to S8). The mixed ge- 
nomic ancestry of the gray snub-nosed monkey 
was stable across all examined individuals with 
the ancestry proportion and tract length of 
the golden snub-nosed monkey (parent A, the 
major contributor) significantly larger than 
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Fig. 1. Geographic distributions of the five snub-nosed monkey species. 
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that of the ancestor to the black-white/black 
snub-nosed monkeys (parent B, the minor 
contributor) (Fig. 2, C and D, and fig. S9). This 
indicates that the dual ancestry of the gray 
snub-nosed monkey is unlikely to have arisen 
from recent introgression (9). 

We further evaluated whether this admixed 
ancestry arose from ancient hybridization by 
using phylogenetic and D-statistics analyses 
as well as fq scans (10, 11). We built 27,943 trees 
across the genome using 100-kb windows and 
found three major topologies (Fig. 3A). The 
most common tree topology (Tree-1; 62.34% 
of the total) (Fig. 3A) was identical to that of 
the previously reported species tree in which 
the gray snub-nosed monkey clustered with the 
golden snub-nosed monkey (parent A) (12). 
By contrast, the second-most common tree 
topology (Tree-2; 22.64% of the total) (Fig. 3A) 
supported the clustering of the gray snub- 
nosed monkey with the ancestor of the black- 
white/black snub-nosed monkeys (parent B). 


Only 14.54% of the trees displayed a topology 
(Tree-3) (Fig. 3A) in which the gray snub-nosed 
monkey diverged from its two presumed par- 
ental lineages. Both Tree-1 and Tree-2 types 
were significantly more common than Tree-3 
(both P < 2.2 x 10°; Student’s t test). In addi- 
tion, windows of the genome supporting Tree-1 
and Tree-2 were significantly larger than those 
supporting Tree-3 (P < 2.2 x 10°" and P = 
1.12 x 10%, respectively; Mann-Whitney U test) 
(Fig. 3, A and B). Such conflicts of tree to- 
pologies more likely arose from the effects 
of hybridization than incomplete lineage 
sorting, under which all tree topologies are 
expected to have similar proportions and dis- 
tributions of window sizes. Signals of his- 
torical hybridization were also evident from 
analyses of D-statistics and fg genomic scans 
(fig. S10 and tables S4 to S5), and as found 
for other species with a history of hybridiza- 
tion (13), recombination rate was significantly 
(P < 0.05) lower in the major ancestral ge- 


nome component of the gray snub-nosed 
monkey genome (derived in this case from 
the golden snub-nosed monkey, parent A) 
than in the minor component (derived from 
the ancestor of the black-white/black snub- 
nosed monkeys, parent B) (Fig. 2, C and D, 
and fig. S11). 


Genomic coalescent simulation further 
supports the hybrid origin of the gray 
snub-nosed monkey 


To determine whether the hypothesis of a hy- 
brid origin was more likely than other spe- 
ciation hypotheses, we extracted the joint site 
frequency spectrum from population genomic 
data of the gray snub-nosed monkey and its 
two assumed parental lineages and used co- 
alescent simulations to infer the most likely 
evolutionary scenario for the origin of the 
gray snub-nosed monkey (fig. S12 and table 
S6). The best-fitting model (model 18 in Fig. 
3C and tables S6 and S7) again supported a 
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Fig. 2. ADMIXTURE, PCA, and genomic parental ancestry tract analyses. (A) ADMIXTURE results for 106 snub-nosed monkeys. (B) PCA results for 106 snub-nosed 
monkeys. (C) Ancestry proportions across all autosomes of the gray snub-nosed monkey that were contributed by parent A (red) and parent B (blue). (D) Boxplots of 
the ancestry tract lengths. Red, ancestry inherited from parent A (the golden snub-nosed monkey); blue, ancestry inherited from parent B (the ancestor of the black- 
white/black snub-nosed monkeys). For the five gray snub-nosed monkeys, 10 haploid genomes were calculated, respectively (Hapl to Hap10). 
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Fig. 3. Genomic phylogenetic analysis and evolutionary scenario simulation. (A) The major three phylogenetic tree topologies and their distributions in 
consecutive windows. (B) Genomic distributions of all topologies identified. (©) The best-fitting evolutionary scenario inferred by coalescent simulation. T-Div is the 
time of divergence of Rhinopithecus. T-Hybrid is the time of the hybridization between the two parental lineages. The respective genetic contributions of the two 
parents to the hybrid origin species, the gray snub-nosed monkey, are 24.84 and 75.16%. Curved arrows of different thicknesses indicate the magnitudes of 


subsequent interspecific gene flows. 


hybrid origin of the gray snub-nosed mon- 
key from the golden snub-nosed monkey and 
the ancestral lineage of black-white/black 
snub-nosed monkeys, with more ancestry con- 
tributed by the golden snub-nosed monkey 
than the other lineage. Because both mater- 
nal mitochondrial genome and paternal Y- 
linked gene trees (fig. S13) supported a sister 
relationship of the gray snub-nosed mon- 
key and the golden snub-nosed monkey, we 
tested whether a hybridization-backcrossing 
scenario (fig. S12, models 20 and 21) could 
explain this. This showed that the fit of these 
models was not significantly better than 
that of the pure hybrid-origin scenario (model 
18) (table S6). Therefore, other mechanisms— 
including mitonuclear incompatibility, mater- 
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nal or paternal replacement of hybrid offspring 
in a small population (9), and purging of mi- 
nor parental ancestry in low recombination 
regions—could have contributed to the sister 
relationship observed in the mitochondrial 
genome/Y-linked gene trees, and to the differ- 
ent genetic contributions from the two par- 
ents to the gray snub-nosed monkey during its 
hybrid origin. 


Genetic basis of the mosaic coat coloration 
of the gray snub-nosed monkey 


We next assessed whether genomic admixture 
gave rise to the peculiar pattern of hair col- 
oration of the gray snub-nosed monkey, which 
displays a mosaic of the color coat patterns of 
both parental lineages, with golden hair on 


C 


the head and deltoid regions similar to that of — 


the golden snub-nosed monkeys, and black 
hair on the lateral limbs (arms and legs) sim- 
ilar to that of the black-white/black snub-nosed 
monkeys. We first used spectrophotometric 
measurement to quantify the amount of two 
melanogenesis-related pigments (eumelanin 
and pheomelanin) produced by melanocytes 
in hairs of the gray snub-nosed monkey and 
its two parental representatives. We found 
that hairs on the head and deltoid regions of 
the golden snub-nosed monkey showed higher 
pheomelanin/eumelanin ratios than those from 
the same regions in the black-white snub-nosed 
monkey. The gray snub-nosed monkey pos- 
sessed elevated pheomelanin/eumelanin ratios 
on its head and deltoid regions similar to those 
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Fig. 4. Quantitative measurement A 
of melanogenesis-related pigments 
and genomic positive selection 
analyses. (A) Spectrophotometric 
measurement of eumelanin and 
pheomelanin in different body parts 
of the gray snub-nosed monkey and 
two parental representatives, the 
golden and the black-white snub-nosed 
monkeys. (B) PSGs identified in the 
gray snub-nosed monkey derived 
from the golden snub-nosed monkey 
population. Three melanogenesis- 
related PSGs are indicated with arrows. 
(C) PSGs identified in the gray snub- 
nosed monkey derived from the black- 
white/black snub-nosed monkey 
populations. Two melanogenesis-related 12 
PSGs are indicated with arrows. 
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of the golden snub-nosed monkey but exhibited 
decreased ratios on its lateral limbs similar to 
those of the black-white snub-nosed monkey 
(Fig. 4A). 

We used a method developed recently (13) 
to identify positively selected genes (PSGs) 
that co-occur in a hybrid and one of its par- 
ents, which may account for their respective 
phenotypic and physiological similarities. In 
this way, we identified 416 PSGs shared by 
the hybrid gray snub-nosed monkey and the 
golden snub-nosed monkey (parent A) and 
414 PSGs by the gray snub-nosed monkey and 
the ancestor of the black-white/black snub- 
nosed monkeys (parent B) (Fig. 4, B and C). 
The PSGs represent genes that were under se- 
lection in each parent lineage before hybrid- 
ization and were alternatively inherited by the 
gray snub-nosed monkey during its origin. 
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Some of these PGSs were found to be involved 
mainly in functions related to melanogenesis 
(fig. S14). Among them, five PSGs (PAH, APC, 
SLC45A2, MYO7A, and ELOVIA) (Fig. 4, B and 
C) have been reported to be related to pigmen- 
tation in the eye retina, skin, and coat and hair 
(14-18). Haplotype analyses of these five genes 
showed that the SLC45A2, MYO7A, and ELOVIA 
genes of the gray snub-nosed monkey were 
derived from the golden snub-nosed monkey 
(parent A), whereas the PAH and APC genes 
were inherited from the ancestor of the black- 
white/black snub-nosed monkeys (parent B) 
(Fig. 5A and fig. S15). Previous studies in 
mammals (including humans) have reported 
that reduced expression of the SLC45A2 gene 
(parent A-derived) results in lower melano- 
somal pH and promotes the synthesis of 
pheomelanin in melanocytes (16, 19), leading 
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to light or blonde skin or coat colors (20, 27). 
By contrast, higher expression of the PAH gene 
(parent B-derived), which encodes a limiting 
enzyme for the conversion of phenylalanine to 
tyrosine during melanin synthesis (14, 22), is 
associated with darker skin or coat color in 
humans and mice (/4, 23). 

All fixed single-nucleotide polymorphisms 
(SNPs) between species in SLC4542 and PAH 
genes were located in promotor regions. Lucif- 
erase reporter assays of SNPs in SLC45A2 
showed that alleles derived from both the 
gray snub-nosed monkey and the golden snub- 
nosed monkey (parent A) caused significantly 
lower transcriptional activity than those from 
the ancestral lineage of the black-white/black 
snub-nosed monkeys (parent B) (Fig. 5B and 
fig. S16), suggesting that this gene likely plays 
a role in the development of the yellow coat 
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Fig. 5. Haplotype analyses and functional assays of melanogenesis-related genes PAH and SLC45Az2. (A) Haplotype analyses of SLC45A2 and PAH genes. 
(B) Luciferase reporter assays of SLC45A2 and PAH genes. For each candidate gene, 293T and A375 cell lines were used. Blue, transcriptional activities induced by 
alleles derived from both the black-white/black and the gray snub-nosed monkeys; red, transcriptional activities induced by alleles derived from both the golden 


and the gray snub-nosed monkeys. 


color parts of the gray snub-nosed monkey. 
On the other hand, luciferase reporter assays 
of fixed SNPs in PAH showed that alleles 
derived from both the gray snub-nosed mon- 
key and parent B caused significantly higher 
transcriptional activity than those from parent 
A (Fig. 5B and fig. S17), suggesting that this 
gene contributes to the development of the 
dark color areas of the coat of the gray snub- 
nosed monkey. Together, these results indi- 
cate that selection acting on early hybrids 
caused the retention and loss of alleles of these 
genes inherited from the two parental line- 
ages, which in turn changed the pheomelanin/ 
eumelanin ratio in their different parts of 
the body to produce the distinctive mosaic 
coat coloration found in the gray snub-nosed 
monkey. 


Discussion 


Our study has produced evidence for ge- 
nomic admixture among species of the genus 
Rhinopithecus and a hybrid origin of the gray 
snub-nosed monkey (Fig. 6). All of these spe- 
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cies are highly endangered primates, and as 
yet, it has not been possible to quantify pre- 
mating and postmating reproductive isolation 
(RI) barriers between them or to determine 
whether hybridization led to the possible 
origin of such barriers between the gray snub- 
nosed monkey and its parents (24-26). How- 
ever, we have obtained evidence that allele 
recombination in the hybrid of different genes 
affecting type and distribution of hair pig- 
mentation between the two assumed paren- 
tal lineages likely produced the mosaic coat 
coloration of the gray snub-nosed monkey. 
Such phenotypic differentiation between spe- 
cies is known to be important in mate recog- 
nition and choice in birds (27), other primates 
(28, 29), and mammals (30), thus creating ef- 
fective premating RI (Fig. 6). Additionally, 
we might expect postmating RI to have arisen 
in the hybrid lineage through alternate fix- 
ing of alleles at different loci involved in 
genetic incompatibility [Bateson-Dobzhansky- 
Muller (BDM) incompatibility model] be- 
tween the parents (31, 32). We found that 


alleles at numerous loci of the gray snub- 
nosed monkey are derived alternately from + 
its two parents (more than 18,000 SNPs and ‘ 
more than 10,000 genes involved from each 
parent) (table S8). Although many of these 
allelic differences might be expected to be 
neutral in effect (37), it is feasible that some 
have caused the hybrid species to exhibit 
BDM genetic incompatibility with both par- 
ents at its origin. Alternately inherited al- 
leles of multiple genes in the gray snub-nosed 
monkey are associated with reproduction 
(such as sperm development, oocyte matu- 
ration, and fertility) (figs. S18 and S19 and 
tables S9 and S10) and show high levels of 
divergence between the two parent lineages. 
If these divergent alleles are involved in hin- 
dering reproduction between these paren- 
tal lineages, their inheritance in the hybrid 
might have caused postmating RI to have 
developed between the hybrid and each par- 
ent, thus adding to the likely premating RI 
(Fig. 6). Premating isolation owing to differ- 
ences in coat coloration pattern (together with 
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Fig. 6. Evolutionary scenario of the hybrid origin for the gray snub-nosed 
monkey. Genome of the gray snub-nosed monkey was generated by the 
historical hybridization between the golden snub-nosed monkey (parent A) and 
the ancestor of the black-white/black snub-nosed monkeys (parent B). Under the 
hybrid origin context, melanogenesis-related PSGs derived from parent A 
(SLC45A2, MYO7A, and ELOVL4) and from parent B (PAH and APC) are likely 


ParentA 


Golden 
snub-nosed monkey 


\ (Yellow hair) 


geographic isolation) could accelerate the de- 
velopment of such genetic incompatibility be- 
tween the hybrid and its parents (31, 33). 
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INTRODUCTION: Primates have evolved a di- 
verse set of social systems, from solitary living 
to large multilevel societies. The traditional 
socioecological model explains this diversity as 
a response to changing environments, which 
shaped patterns of cooperation and competi- 
tion for resources and predator defense. How- 
ever, the socioecological model does not explain 
why sympatric species living in the same envi- 
ronment exhibit different social systems. There 
is a growing consensus that primate social 
organization shows a strong phylogenetic sig- 
nal as a result of shared inheritance from a 
common ancestor and evolved stepwise along 
with species differentiation. This implies a ge- 
netic basis for the evolution of animal social 


systems. However, the genomic mechanisms 
that underlie the expression of primate social 
systems remain poorly understood. 


RATIONALE: Asian colobines, a subfamily of 
Old World monkeys, are represented by seven 
genera and 55 species that are distributed 
from tropical rainforests to snow-covered 
mountains. They exhibit four distinct types of 
social organization and provide a good model 
for examining the mechanisms that drive so- 
cial evolution from a common ancestral state 
to the diverse systems present today. By inte- 
grating new genomic data across all seven 
colobine genera with paleoenvironmental in- 
formation, the fossil record, social organization 
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Adaptation for survival in cold climates facilitated evolution of social behavior in colobine monkeys. Cold 
environments promoted the social evolution of Asian colobines in a stepwise manner. Genomic changes in 
neurohormonal regulation, including in the dopamine and oxytocin pathways, improved social affiliation in odd-nosed 
monkeys and thus promoted social aggregations from independent one-male, multifemale groups into large 


multilevel societies. Ma, million years ago. 
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structed a socioecological-genomic framework to 
identify selective pressures that form the genetic 
basis for social evolution in Asian colobines. 


RESULTS: To understand the evolutionary pro- 
cess of social systems in Asian colobines, we 
first reconstructed their phylogenetic relation- 
ships using whole-genome data. In contrast to 
the previous hypothesis of three major clades, 
our study reveals that Asian colobines split 
into two clades: the odd-nosed monkeys and 
the classical langurs. Our phylogenetic analy- 
ses detected a strong signal in colobine social 
evolution, suggesting that these social systems 
evolved in a stepwise manner, with ancestral 
one-male, multifemale groups fusing into semi- 
multilevel societies characterized by fission- 
fusion and then merging into complex multilevel 
societies. Consistent with our ecological re- 
sults indicating that extant colobine primates 
that inhabit colder environments tend to live 
in larger groups, we found that adaptations 
driven by ancient cold events, including the 
late Miocene cooling and Pleistocene glacial 
periods, played an important role in promot- 
ing these changes in social evolution. Further- 
more, our genomic analyses revealed that these 
cold events promoted the selection of genes 
involved in energy metabolism and neurohor- 
monal regulation. In particular, more-efficient 
dopamine and oxytocin pathways developed 
in odd-nosed monkeys, which might have re- 
sulted in the prolongation of maternal care 
and lactation, favoring infant survival in cold 
environments. These adaptive changes also ap- 
pear to have strengthened interindividual af- 
filiation, increased male-male tolerance, and 
facilitated the stepwise social aggregation from 
independent one-male, multifemale groups to 
large multilevel societies in Asian colobines. 


CONCLUSION: Our results reveal a stepwise 
evolutionary scenario of social organization in 
Asian colobines. We show that ancient glacial 
events selected for neurohormonal regulation, 
including dopamine and oxytocin pathways that 
promoted aggregation from one-male, multi- 
female groups into large multilevel societies. 
Our study demonstrates a direct link between 
a genomically regulated adaptation and social 
evolution in primates and offers new insights 
into the mechanisms that underpin behavioral 
evolution across animal taxa. 
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The biological mechanisms that underpin primate social evolution remain poorly understood. Asian 
colobines display a range of social organizations, which makes them good models for investigating social 
evolution. By integrating ecological, geological, fossil, behavioral, and genomic analyses, we found 

that colobine primates that inhabit colder environments tend to live in larger, more complex groups. 
Specifically, glacial periods during the past 6 million years promoted the selection of genes involved in 
cold-related energy metabolism and neurohormonal regulation. More-efficient dopamine and oxytocin 
pathways developed in odd-nosed monkeys, which may have favored the prolongation of maternal care 
and lactation, increasing infant survival in cold environments. These adaptive changes appear to have 
strengthened interindividual affiliation, increased male-male tolerance, and facilitated the stepwise 
aggregation from independent one-male groups to large multilevel societies. 


rimates have evolved a diverse set of 

social systems (J-3). From solitary living 

and small families to large multilevel 

societies, evolution associated with var- 

ied behavioral tactics has allowed pri- 
mates to successfully exploit a wide range of 
habitats (4-9). The socioecological model ex- 
plains the diversity of primate social systems 
as a response to changing environments, 
which shaped patterns of cooperation and 
competition for resources and predator de- 
fense (10-12). However, the socioecological 
model does not explain why sympatric species 
can live in the same environment but exhibit 
different social systems (13, 14). 

Evidence increasingly supports that the so- 
cial system of different primate taxa is likely 
inherited from a recent common ancestor, 
rather than evolving as a direct adaptation to 
current environmental conditions (J5, 16). For 
example, although they inhabit the same rain- 
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forest, white-handed gibbons form monoga- 
mous pairs, whereas Thomas’s langurs live in 
a one-male, multifemale polygynous group; 
long-tailed macaques live in multimale, multi- 
female groups; and Bornean orangutans live 
solitarily with occasional social contact (J7). 
Therefore, there is a growing consensus that 
certain components of social systems have a 
strong phylogenetic signal (5, 18) and evolved 
in astepwise manner in conjunction with spe- 
cies differentiation (16, 19). However, the geno- 
mic mechanisms that constrain or promote 
the expression of primate social systems re- 
main poorly understood (20, 21). 

Asian colobines, a subfamily of Old World 
monkeys, are represented by seven genera and 
55 species that are distributed from tropical 
rainforests to snow-covered mountains. They 
exhibit four distinct types of social organiza- 
tion and provide a good model for examining 
the multiple mechanisms that have driven their 
social evolution from a common ancestral state 
to the diverse systems that are present today 
(Fig. 1 and data S1). These Asian colobines are 
categorized into two clades (22). The classical 
langurs (genera Presbytis, Semnopithecus, and 
Trachypithecus) are each principally charac- 
terized by a one-male, multifemale unit; poly- 
gynous mating; and strict male territorial 
defense. In addition, a small number of species 
in this clade such as the Himalayan gray langur 
(Semnopithecus schistaceus) and the Indo- 
chinese langur (Trachypithecus crepusculus) 
exploit high-altitude forests and occasionally 
form cohesive larger multimale, multifemale 
groups (Fig. 1C). By contrast, species in the odd- 
nosed monkey clade exhibit a wide spectrum 
of social systems. Simakobus (genus Simias) 


live in independent one-male, multifemale 
units, whereas doucs (genus Pygathrix) and 
proboscis monkeys (genus Nasalis) live in dis- 
tinct nonterritorial one-male, multifemale units, 
which seasonally fuse into a single breeding 
band or aggregate together at nighttime sleep- 
ing sites (23) (data S1). We term these semi- 
multilevel societies because this social system 
is characterized by flexibility in switching be- 
tween independent one-male, multifemale units 
and multilevel societies. The last group of odd- 
nosed monkeys are the snub-nosed monkeys 
(genus Rhinopithecus). They live in typical 
multilevel societies, which are composed of sev- 
eral core one-male, multifemale units embedded 
within a stable and larger social matrix and 
associated all-male bachelor bands (24). 

In this study, we integrated newly acquired 
de novo high-quality genome data represent- 
ing all seven colobine genera with paleo- 
environmental information, the fossil record, 
type of social organization, level of intrasexual 
tolerance, and ecological databases from 2189 
habitat locations (data S2) of 48 extant Asian 
colobine species. This allowed us to construct a 
comparative dynamic socioecological-genomic 
framework that identifies the genetic basis of 
social evolution in primates. 


Phylogeny reconstruction 


To understand the social evolution of Asian 
colobines, we clarified their phylogenetic rela- 
tionships and natural histories. To resolve 
previous inconsistencies concerning colobine 
phylogenetic relationships (25, 26), we se- 
quenced and analyzed seven de novo genomes of 
species from all seven genera of Asian colobines 
[supplementary materials (SM) section 3.3.1]. 
Based on a combination of the concatenation 
method and the coalescent method, a new 
phylogenomic tree was reconstructed from 
a total of 4992 one-to-one orthologs (fig. $7). 
With calibrations from new fossil discov- 
eries, we were able to develop greater preci- 
sion in divergence time estimates (Fig. 2A). 
This new high-confidence topological struc- 
ture enabled us to trace the evolutionary his- 
tory of social systems in Asian colobines. The 
results revealed that Asian colobines split into 
two well-supported clades: the odd-nosed mon- 
keys and the classical langurs. The genera 
Presbytis, Semnopithecus, and Trachypithecus 
are best described as a monophyly of the clas- 
sical langurs (Fig. 2A). These results contrast 
with the hypothesis of three major clades, with 
Presbytis located at the basal position of an 
independent monophyly, which was proposed 
in previous studies (27, 28). 


Phylogenetic signal of social evolution 


To understand how the set of social organi- 
zations of extant Asian colobines was shaped 
by their phylogenetic lineage, we used phylog- 
eny trait reconstruction modeling. Based on 


1 of 12 


C 


RESEARCH | PRIMATE GENOMES 


odd-nosed monkeys classical langurs | 
@) 
eho 
= ish) 
Fis 
22 o 
828 5 
e284 
9 = 
825 
88s 8 
3e82a 
23593 
a - 
no 
oi 
oS 
—s 
Oo 
ms: 
@ 
= 2 
a 
0 8) 
56s 
~~ 
Oo 
— 
o |_| |_| afl 
Rhinopithecus |Pygathrix Nasalis|Simias| Presbytis Trachypithecus Semnopithecus 
aD DDD Vov SF M DBVVVW HAHA ASA LYVYHHKNH wH 
Ss@2r2® 2382 8S § SSS8SF SESRF F885 FssxssPs § 
Sess S22 S § SEESSS SSRER SSE ESTERS F 
PSSh BSS £ S RESIS S888 fF ea SE =8S8e FB 
Cc sz TTS a & a S 3 34 =. ses a Gas 5 4 9 
G = ¢ ny = fe) o 
—— a 7) a 
7) ce 
NO 2G 
Se ee LSS =e B 
m=z 
= ee | eC | 
sc 
=-° a [=a] ===> 2a = = =a 7 7 lu s 
id 1 i ; el 
@ &| | 
G. 
Oo 
Ss 8 Low land 
a? season oman 
B ool ae 4 Fission 
= “eid Day Fusion 
% on Fission High land 
S| . 
2. Rainy Fusion 
season Night 


Shared home ranges 


Fusion 


eet S 5 ee 
_Llassicat enue, | dq 
LL a Lv ly 

anes 


Territory defense 


opic of Cancer 


Equator 


@ 
ll 


wll Se 2 
h ena 


Sat fos 
Indonesia _ 
te aes . 


| 


oO 
S. concolor ~~ 


ee 


od Ss. oe A PICT. frencoaly Bam 
; Bo Sac 
lus’ 


ithecus i 


Peers 
; P. potenziani «. 
Indian Ocean 


lonesia 


In 
—— - ie 


Fusio 


Shared home ranges 


Fig. 1. Taxonomy and the social systems of Asian colobines. (A) Classification and vertical distribution of Asian colobines. (B) Geographical distribution of 

odd-nosed monkeys and classical langurs. (C) Group size increases with increasing elevation and latitude in both odd-nosed monkeys and classical langur species. MMU, multimale, 
multifemale unit; OMU, one-male, multifemale unit. (D) Stepwise evolution of social systems in Asian colobines. MLS, multilevel society; MMG, multimale, multifemale group; 
OMG, one-male group. [Credits: All monkey illustrations are copyrighted 2014 by Stephen D. Nash/IUCN/SSC Primate Specialist Group and used with permission] 


the new phylogenomic tree (fig. S2, data S3, and 
SM section 3), we used Pagel’s 4 (29) and Phylo.D 
(30) to evaluate the strength of the phylogenetic 
signal in their social evolution. The results showed 
a strong signal [Pagel’s A = 0.81, log likelihood 
(LL) = 34.98, probability of 4 resulting from 
Brownian model (Px. grownian) = 1; estimated 
Phylo.D (D) = -0.44, probability of D resulting 
from Brownian model (Pp prownian) = 0.87] in 
colobine social evolution (table S6 and SM 
section 4.1). Next, we used a macroevolutionary 
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model fitting analysis to compare the fit factors 
of phylogenetically associated models [A, «, 6, 
early burst (EB)] with nonphylogenetic models 
(white-noise model) in Asian colobines. The re- 
sults showed that the likelihood of each of the 
four phylogenetically associated models was sig- 
nificantly higher than that of the white-noise 
model (table S10). These results indicate that dur- 
ing their evolutionary history, phylogeny was a 
relevant driving factor rather than a random 
factor in colobine sociality (SM section 4.1.2). 


To verify whether social evolution in Asian 
colobines was stepwise, we compared ordered 
(stepwise) models with an unordered evolu- 
tion model using MultiState in BayesTraits. 
By comparing the marginal likelihoods among 
the three candidate stepwise models (SM sec- 
tion 4.1.3) and the unrestricted model (un- 
ordered model) (fig. S2), a strong Bayes factor 
(log BF nodel_OMM or osm >10; see SM section 
4.1.3) suggested that Asian colobine social 
systems evolved in a stepwise manner (fig. S2A 
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Fig. 2. Phylogenetic relationship and social system evolution in Asian 
colobines. (A) DensiTree presentation of phylogenetic trees of orthologous 
genes. Gene trees with a clade probability larger than 30% are shown. 

Mya, million years ago. (B) Social system evolution in Asian colobines. The 
pie chart at each ancestral node shows the reconstructed ancestral social 
state. Different colors correspond to the estimated social systems and 

are proportional to their posterior probability. (©) The posterior distributions 


and table S8). Therefore, we investigated these 
lineage-specific evolutionary pathways in great- 
er detail. 

We traced the set of social conditions for 
each of the ancestral nodes using a Bayesian 
phylogenetic framework (SM section 4.1.4 and 
Fig. 2B). The results showed that the most likely 
ancestral social state of Asian colobines (Fig. 2B) 
was an independent one-male, multifemale unit 
[probability of ancestral state (ASPoyyz) = 0.76 + 
0.16]. Based on the Bayesian phylogenetic 
framework results, we identified three line- 
ages of ancestral Asian colobines, each with 
a different social evolutionary history (Fig. 2B 
and fig. S2). The first lineage retained the 
ancestral one-male, multifemale unit system 
that is present in most of the classical langurs, 
such as Presbytis (Fig. 2, B and C). The second 
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lineage included a small number of classi- 
cal langurs, such as the Indochinese langur 
(T. crepusculus) (Fig. 2B), that inhabit moun- 
tainous regions and tend to merge into larger 
multimale, multifemale groups. This contrasts 
with their sister species that live in warmer 
lowland regions and form single or inde- 
pendent one-male, multifemale units. 

The third evolutionary pathway is related 
to the stepwise aggregation of core one-male, 
multifemale units into multilevel societies that 
characterize the odd-nosed monkey clade. The 
Bayesian phylogenetic framework results indi- 
cate that in this lineage, the ancestral inde- 
pendent one-male, multifemale units aggregated 
into semi-multilevel societies after splitting 
from the common ancestor of the odd-nosed 
monkey clade about 6.5 million (7.0 million to 
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of the probabilities (x axis) of each of the ancestral nodes marked in (B). 
C.A., common ancestor. (D) High principal components PC1 scores indicate 
high temperature and humidity. (E) Low principal components PC2 scores 
indicate wide ranges of seasonality, annual temperature, and precipitation. The 
scatter size in a species is proportional to its group size. [Credits: All monkey 
illustrations are copyrighted 2014 by Stephen D. Nash/IUCN/SSC Primate 
Specialist Group and used with permission] 


5.7 million) years (Ma) ago (Figs. 2, B and C, 
and 3A). Subsequently, the lineage leading to 
the extant doucs (Pygathrix) and proboscis 
monkeys (Nasalis) inherited this social sys- 
tem, with multiple one-male, multifemale units 
sharing a home range through a process of 
fusion-fission (data S1 and S7). Simias, by con- 
trast, independently reverted to an ancestral- 
like social system characterized by independent 
one-male, multifemale units. Our results indicate 
that the snub-nosed monkeys (Rhinopithecus) 
represent the second step of social aggregation 
from semi-multilevel societies to typical multi- 
level societies, with multiple one-male, multi- 
female units forming a large stable breeding 
band in which residents travel, rest, and feed to- 
gether throughout the year. The breeding band, 
which may include more than 100 individuals, 
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Fig. 3. Natural history of Asian colobines. (A) Reconstructed phylogenetic relationship of Asian 


colobines. The node bars indicate the 95% confidence interval for each branch. (B) Demographic history 
of seven Asian colobines estimated by PSMC. The regions marked with a vertical blue bar correspond 
to glacial periods. g, generation time; u, mutation rate. (C) Historical sea surface temperature and 
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before the present) period. [Credits: All monkey illustrations are copyrighted 2014 by Stephen D. Nash/ 
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is shadowed by all-male bachelor bands (Fig. 
2B). These results demonstrate that social 
evolution in Asian colobines represents a newly 
discovered two-step pathway from ancestral 
independent one-male, multifemale units to 
large aggregated multilevel societies. This path- 
way is distinct from that of African papionins 
(e.g., gelada, hamadryas baboon), whose mullti- 
level societies evolved through the internal 
fissioning of large multimale, multifemale 
groups (4, 31). 


Social systems under contrasting environments 


To understand how ecological factors have 
shaped primate social evolution, we constructed 
an Asian colobine ecological dataset (data S2) 
based on 19 bioclimatic variables that were 
extracted from a total of 2189 current locations 
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across the ranges of 48 extant Asian colobine 
species (data $2). Based on principal compo- 
nents analyses, we found that species that are 
presently distributed in colder, drier, and more 
seasonal climates tend to live in larger groups, 
whereas species that inhabit warmer and 
moister environments tend to form smaller 
groups (Fig. 2, D and E). The mean and sta- 
bility of temperature and humidity were iden- 
tified as the main factors that affect group size 
in odd-nosed monkeys (which explained 84.8% 
of the variance) and classical langurs (which 
explained 85.7% of the variance) (table S9). 
Furthermore, the random-walk model for con- 
tinuous traits in BayesTraits (32) showed that 
group size was negatively correlated with an- 
nual mean temperature [Pagel’s 1 = 0.59; corre- 
lation coefficient (R) = —0.69; log BF = 17.75, 


which is greater than 10], indicating that cold 
conditions may have selected for increased group 
size in both clades of Asian colobines (SM sec- 
tion 4.3.2). This pattern of enhanced sociality in 
cold and dry environments has also been re- 
ported in Australian rodents (33) and cooper- 
ative breeding birds (34). In the case of Asian 
colobines, transitions from one social system 
to another appear to have occurred at ancient 
evolutionary nodes and have been retained 
over long periods of time. This suggests that 
colobine social systems may reflect adapta- 
tions to ancient environmental conditions rather 
than a direct response to current environmen- 
tal conditions. 


Evolutionary history and radiation 


Assuming that ancient ecological factors played 
an important role in promoting stepwise 
social evolution (Figs. 1 and 2), we traced the 
natural and social evolutionary history of Asian 
colobines over the past 8 Ma. This was accom- 
plished by integrating data from new discov- 
eries in the fossil record (data S4), paleogeology, 
paleogeography, paleoclimate, and historical 
sea level dynamics (data S5), as well as the 
present geographical distribution of indi- 
vidual Asian-colobine taxa (data $2). Using 
BioGeography with Bayesian and likelihood 
evolutionary analysis, we reconstructed the 
ancestral distribution pattern of Asian colobines 
(SM section 4.4.3 and fig. S12). In comparing 
the likelihoods of the resulting candidate mod- 
els, with results from geographic and multiple- 
state speciation and extinction analyses (SM 
section 4.4.2 and fig. S11), we found that an- 
cient dispersal routes and geographic isola- 
tion appear to have played important roles in 
Asian colobine speciation (Fig. 3D). 

In contrast to the previous hypothesis that 
ancestral colobines dispersed into Asia via a 
northern route through China (35), we com- 
bined data on newly reported Mesopithecus 
fossils (7.9 to 7.0 Ma ago) found in Pakistan, Iran, 
and Afghanistan (SM section 4.4.5 and fig. S3) 
that support an alternative scenario. The com- 
mon ancestor of Asian colobines, Mesopithecus, 
first entered Eastern Asia via the Indian sub- 
continent during the late Miocene (10.8 to 7.8 Ma 
ago) (Fig. 3D and data S4). Integrating this 
scenario with divergence times estimated from 
our newly constructed phylogenomic tree, we 
suggest that Mesopithecus spread throughout 
India and then divided into two clades at about 
7.6 (8.0 to 6.7) Ma ago (Fig. 3, A and D). 

One clade likely gave rise to the common 
ancestor of classical langurs, including Presbytis, 
Semnopithecus, and Trachypithecus, within a 
monophyletic clustering (Fig. 3A). Because of 
the uplifting of the Himalayas, some elements 
of this radiation spread eastward through the 
Indo-China Peninsula into warmer tropical 
forests in Sundaland during the late Miocene, 
around 7.4 (7.8 to 6.6) Ma ago. This group 
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evolved into the genus Presbytis (Fig. 3, A 
and D). During the Pliocene, about 4.9 (5.6 to 
4.2) Ma ago, other members of this clade 
divided into two populations. One remained 
in the Indian subcontinent and evolved into 
Semnopithecus, whereas the other migrated 
eastward, spreading into southwest China and 
the Indo-China Peninsula in the Pleistocene. This 
lineage evolved into the genus Trachypithecus 
(Fig. 3, A and D). 


Cold events promoted social aggregation in 
odd-nosed monkeys 


In contrast to the classical langurs, our results 
suggest that cold events played an important 
role in adaptation and social aggregation along 
with speciation in the common ancestor of 
odd-nosed monkeys (Fig. 3). Combined with 
the new fossil Mesopithecus pentelicus, which 
was found in Zhaotong, Yunnan Province, 
China (identified as the most recent common 
ancestor of the odd-nosed monkey clade) (36) 
and was dated to 6.4 (6.7 to 6.0) Ma ago during 
Late Miocene Cooling (7.0 to 5.4 Ma ago), we 
propose that the ancestor of odd-nosed mon- 
keys dispersed eastward from the Indian sub- 
continent, along the uplifted Himalayas, and 
then dispersed into the southeastern margin 
of the Tibetan Plateau (Hengduan Mountains 
region) (7.6 to 6.5 Ma ago) (Fig. 3, A and C). 
Paleoenvironmental evidence shows that after 
their arrival, the common ancestor of odd- 
nosed monkeys encountered a cooler and 
drier climate caused by the rapid uplifting 
of the Hengduan Mountains (8.0 to 6.0 Ma 
ago) during a global cooling period in the late 
Miocene (Fig. 3D and data S5). An additional 
changing monsoon climate in the area has 
also enhanced the cooling effects (fig. S3 and 
data S5). These events coincided with the evo- 
lution from an ancestral one-male, multifemale 
unit to a semi-multilevel society in odd-nosed 
monkeys (Figs. 2B and 3B). The results indi- 
cate that adaptations related to these cold 
events appear to have resulted in larger and 
more aggregated social groups in the odd- 
nosed monkey clade (Fig. 3). 

Subsequently, the ancestors of odd-nosed 
monkeys evolved into four genera (Fig. 2A). 
Along with these cold events, the common 
ancestor of proboscis monkeys (Nasalis) and 
simakobus (Simias) migrated southward, cross- 
ing the land bridge that connected isolated 
islands in Southeast Asia (Sundaland) at about 
6.5 (7.0 to 5.7) Ma ago. This radiation dispersed 
into tropical forests as far as Sumatra and 
Borneo (Fig. 3, D and E), facilitated by a fall in 
sea level caused by expanding ice sheets in the 
polar regions during glacial events (Fig. 3 and 
fig. S3). 

The ecological niche modeling and pairwise 
sequentially Markovian coalescent (PSMC) 
analyses suggest that alternating glacial and 
interglacial events during the Pleistocene re- 
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sulted in reconnection and disconnection of 
land bridges as well as the expansion and con- 
traction of suitable habitats (Fig. 3, B and E). 
This led to the isolation and divergence of 
proboscis monkeys and simakobus about 1.4 
(2.4 to 0.8) Ma ago (Fig. 3, A and D). This 
dispersal scenario is consistent with the semi- 
multilevel society social grouping pattern main- 
tained by proboscis monkeys, even though 
they presently inhabit warmer environments. 
By contrast, simakobus, which today only in- 
habit the Mentawai Islands west of Sumatra, 
reverted to independent one-male groups, sim- 
ilar to the Asian colobine ancestral condition. 

The remaining odd-nosed monkeys gave 
rise to the common ancestor of doucs (Pygathrix) 
and snub-nosed monkeys (Rhinopithecus), which 
adapted to the cold climate present in the 
northern region of East Asia during the Late 
Miocene Cooling (6.5 to 6.2 Ma ago). Later, a 
branch of this radiation migrated south into the 
Indo-China Peninsula and evolved into Pygathrix 
at 6.2 (6.6 to 5.4) Ma ago (Fig. 3A). The PSMC 
analysis also showed that an expansion in the 
effective population size of doucs was associated 
with an increase in cold temperatures during 
the middle and late Pleistocene glacial event 
(Fig. 3B). Compared with the semi-multilevel 
societies of proboscis monkeys, in which non- 
territorial one-male, multifemale units aggre- 
gate together only at night, the semi-multilevel 
societies of doucs are characterized by an ex- 
tended aggregation period during the rainy 
season. The more cohesive semi-multilevel 
societies of doucs appear to be related to a 
longer period of inhabiting glacial environ- 
ments in colder northern regions compared 
with proboscis monkeys. 

By contrast, the snub-nosed monkeys (genus 
Rhinopithecus) evolved from an ancestral line- 
age that remained in the north and exper- 
ienced all major Pleistocene glacial cold events 
in high-latitude forests (data S1 and S82). Today, 
four of the five Rhinopithecus species are con- 
strained to high-altitude temperate mountain 
forests up to 4500 m. These habitats are char- 
acterized by relatively cool summers and ex- 
tended cold winters. This includes the golden 
snub-nosed monkeys (Rhinopithecus roxellana), 
which occupy the northern-most distribution 
of all colobine species (Fig. 1B). Through step- 
wise social evolution, snub-nosed monkeys 
evolved a social system distinguished by larger 
group size, increased male intrasexual toler- 
ance, and the stable social aggregation of one- 
male, multifemale units that characterize their 
typical multilevel societies (Fig. 2B). 


Colobine genomic evolution 


These phylogenetic-based and cold-driven evo- 
lutionary scenarios point to a potential genetic 
mechanism that promoted the stepwise process 
of social aggregation in Asian colobines. Eco- 
logical pressures may have selected for genomic 


changes early in colobine evolution that pro- 
moted an expansion of prosocial behaviors. 
Therefore, to identify the genetic basis of pri- 
mate social evolution, in addition to the ref- 
erence genomes of two African colobines as 
outgroups, we provide 10 genomes that rep- 
resent all seven genera of Asian colobines, 
including six genomes from all four genera 
of odd-nosed monkeys (table S2). 

Given that the ancestor of the odd-nosed 
monkey clade was initially aggregated into 
semi-multilevel groups in response to glacial 
events, based on the genomes of four extant 
genera, we reconstructed the genome of the 
common ancestor of odd-nosed monkeys using 
likelihood-based and maximum parsimony 
methods. Based on the branch-site and branch 
model in phylogenetic analysis by maximum 
likelihood (PAML) (37) and the evolutionary 
rate model (38), we compared the adaptive 
divergence between the ancestral odd-nosed 
monkey and other primates in coding genes, 
as well as the conserved model generated by 
PhastCons (39) and the aov.phylo model in 
GEIGER (40) for comparison of the conserved 
noncoding elements (CNEs). For coding genes, 
we identified 78 candidate positively selected 
genes and 371 candidate rapidly evolving genes 
from a total of 17,191 one-to-one orthologous 
genes from whole-genome alignment. We then 
filtered these candidate genes to reduce false- 
positive results (SM section 5.1.6) and detected 
30 positively selected genes and 228 rapidly 
evolving genes (P < 0.05) (tables S14 and S16). 
After obtaining the QQplot from all ortholo- 
gous genes (fig. S15) and the false discovery 
rate corrections, we further noticed a set of 
genes with higher levels of significance (tables 
S14 and S16). These genes are associated with 
multiple functions, for example, cold-related 
energy metabolism as the positively selected 
gene HMCN2, which is involved in lipid me- 
tabolism (47) and may aid in energy mainte- 
nance in cold environments. We also identified 
LTBP2 and FLNC as rapidly evolving genes, 
which are involved in adipocyte differentia- 
tion and fat degradation (42, 43) and may be 
associated with nonshivering thermogenesis 
to increase body heat during periods of low 
temperature (44). In addition, we found a set 
of rapidly evolving genes (table S16) related 
to neurohormonal regulation, such as DLGAP3 
and AP2A/], which are involved in neurotrans- 
mission systems, such as the neurotransmis- 
sion system that involves 5-hydroxytryptamine, 
which regulates grooming and other social be- 
haviors (45, 46). 

In addition, we obtained a total of 23,038 
CNEs and 4351 ultraconserved noncoding ele- 
ments (UCNEs) and identified 636 specific 
CNEs and 283 fast-evolving UCNEs (P < 0.05) 
in ancestral odd-nosed monkeys that distin- 
guished them from the outgroups (SM section 
5.1.2 and Fig. 4A). Focusing on the selected 
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Fig. 4. Genome landscape of Asian colobines associated with social evolution. 
(A) Enrichment analysis of specific CNEs (blue dots) and genes (red dots) in the 
common ancestor of odd-nosed monkeys. The full results are shown in tables S19 and 
S22. FDR, false discovery rate. (B) Genome-wide PGLS analyses between the 
evolutionary rate and group size across Asian colobines. Genes enclosed in rectangles 
indicate that their evolutionary rate (dN/dS, where N is the number of nonsynonymous 


genes and UCNE- and CNE-associated genes, 
we annotated these genes to the Gene Ontol- 
ogy terms and the Kyoto Encyclopedia of 
Genes and Genomes (KEGG) pathway data- 
base and performed gene enrichment analyses 
using the KEGG Orthology Based Annotation 
System (KOBAS) (47) (SM section 5.2). The 
results showed that most of the high-ranking 
significant Gene Ontology terms and the path- 
ways were involved in immunity, fat metabo- 
lism, and adaptations to a high-cellulose diet 
(Fig. 4A and fig. S17). These pathways are as- 
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sociated with energy- and heat-acquiring 
pathways that maintain body temperature to 
survive in the cold, such as the phagosome and 
Chagas pathways (SM section 5.2 and table 
S18). In addition, based on the evolutionary 
rate model, the analysis of rapidly evolving Gene 
Ontology terms also distinguished similar pat- 
terns as the enrichment analyses described 
earlier in this section, such as mammary gland 
development, fatty acid metabolism, and cellu- 
lar glucose homeostasis (figs. S17 and S18). Im- 
portantly, both of these analyses revealed that 


sites and S is the number of synonymous sites) is significantly correlated with group 
size. R°, coefficient of determination. (C) The correlation coefficients of each 

of the genes in each of the distinguished pathways. (D) Regression analysis between 
dN/dS and group size of the pathways. (E) Genes exhibit specific mutations in the 
odd-nosed monkeys. Genes involved in the oxytocin pathway are colored blue, 
whereas those involved in the dopamine pathway are colored red. 


genes associated with neurohormonal regu- 
lation were significantly enriched (Fig. 4A). 
These results imply that cold-related energy 
metabolism and neurohormonal evolution ap- 
pear to have jointly evolved in the common 
ancestor of the odd-nosed monkey clade. 


Genome-wide association with social evolution 


Based on these results, we investigated ge- 
nomic changes in all extant Asian colobines 
that are relevant to social aggregation by ex- 
ploring the potential genes and pathways that 
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Fig. 5. Mutations in genes that encode proteins in the oxytocin and 
dopamine pathways and functional validation in odd-nosed monkeys. 

(A) Nucleotides inserted into the UCNEs in odd-nosed monkeys. (B) Genetic 
changes identified in the oxytocin pathway. (C and D) Genetic changes identified 
in the dopamine synthesis process as well as signal transduction in and 
cellular regulation of the dopamine pathway. (E) Comparison of the proportion 


OMG semi-MLS MLS 


OMG semi-MLS MLS 


of selected genes in the oxytocin (OXT) and dopamine (DA) pathways. (F) Three- 
dimensional views of the DRD1 protein of douc monkeys. (G and H) The 

in vitro receptor activity tests for OXTR and DRDI. R. roxellana and R. bieti share 
the same DRD1 amino acid sequence. (I) Prosocial behavioral characteristics 
related to the oxytocin and dopamine pathways in Asian colobines. For (H) and 
(he "P= 0.05 and *P < O01, 


correlated with the group-size spectrum from 
one-male, multifemale groups to multilevel 
societies. First, we constructed an orthologous 
gene set that focused on neurohormonal sys- 
tems from nine genomes, including those 
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of extant odd-nosed monkeys and classical 
langurs. Following the a priori candidate genes 
method (48, 49), we obtained a total of 2103 
orthologous genes that are defined as or ex- 


hibited annotations in neurohormonal regula- 


tion and social behavior from Gene Ontology 
and the KEGG pathway database (table S25). 
Focusing on these 2103 genes, we next per- 
formed correlation analyses and used mean 
group size as a continuous variable to represent 
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different forms of social organization to com- 
pare the evolutionary rate of each gene across 
species. Based on a phylogenetic generalized 
least squares (PGLS) regression analysis (50) 
(SM section 5.4), we detected 213 genes that 
were positively correlated and 66 genes that 
were negatively correlated with group size 
(Fig. 4B and table S26). 

Then, focusing on these correlated genes, 
we performed two independent analyses, the 
enrichment analyses and the pathway corre- 
lation analyses to distinguish the specific path- 
ways that correlated with group size. The 
enrichment analyses from these 213 and 66 genes 
using KOBAS distinguished 349 pathways that 
exhibited significant P values after correction 
for false discovery rates. We then ranked these 
pathways based on the P values (table S28). 
For the pathway correlation analyses, we fo- 
cused on the 213 positively correlated genes, 
which may serve multiple functions across 
pathways, and recategorized these genes into 
105 corresponding pathways. By comparing 
the evolutionary rate for each gene of each 
species in a pathway with mean group size in 
the corresponding species, we estimated the 
Spearman’s correlation coefficients for each 
pathway. We then ranked these pathways by 
their correlation coefficients (tables S29 and 
S30 and SM section 5.4). 

The results of both analyses showed that 
high-ranking pathways were primarily asso- 
ciated with categories of energy metabolism, 
neural signal transmission regulation, and im- 
munity that may relate to group living (tables 
$28 and S29). For example, the regulation of 
lipolysis in adipocytes is associated with glu- 
cose and lipid metabolism (57). These path- 
ways are relevant to energy demands and 
utilization and help to maintain body temper- 
ature and compensate for heat loss in cold 
environments (52). These high-ranking path- 
ways also include those involved during the 
bacterial invasion of epithelial cells, which 
are reported to facilitate infection avoidance 
(53, 54). These same pathways also appear to 
function in cellulose fermentation by the gut 
microbiome, which is related to the folivorous 
diet of colobine primates (55). In addition, both 
analyses indicated that the remaining high- 
ranking pathways are engaged in neural sig- 
nal transmission and regulation, such as the 
sphingolipid signaling pathway, which is asso- 
ciated with brain development and neural sys- 
tem maintenance (56) (SM section 5.4), as well 
as the particular hormones such as glutamate, 
dopamine, oxytocin, and 5-hydroxytryptamine 
(tables S28 and S29). 

Moreover, both of the analyses distinguished 
pathways related to materials that function in 
neuron structure and the neuronal connec- 
tivity system, including axon guidance, cho- 
linergic synapse, and synaptic vesicles (tables 
$28 and S829). The enrichment analyses also 
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distinguished dendrite, dendritic spine, syn- 
apse, and neuron projection as high-ranking 
Gene Ontology terms (table S28). These find- 
ings lay the structural foundation for signal 
transduction in the neural interaction network 
(57). Importantly, based on the enrichment 
analyses, the axon guidance and cholinergic 
systems, which were the first- and the sixth- 
highest-ranking pathways estimated from the 
KEGG database, are reported to affect and con- 
trol dopamine release (58). Moreover, these 
analyses also distinguished the mitogen- 
activated protein kinase signaling and glu- 
tamatergic synapse pathways, which mediate 
downstream calcium signaling for the oxyto- 
cin and dopamine pathways (Fig. 4, C and D, 
and tables S29 and S30). These neurotransmit- 
ter systems, and the particular hormone types 
that they serve, suggest that neurohormonal 
regulation, including the oxytocin and dopa- 
mine pathways, is significantly related to group 
size in extant Asian colobines. 

Therefore, we explored how neurohormonal 
systems, including the dopamine and oxytocin 
pathways, function in social behavior and the 
evolution of social group size. Oxytocin and do- 
pamine play essential roles in maternal reward 
attachment, strengthening the mother-infant 
bond and maintaining nursing (59-63). Mam- 
mals living in colder environments tend to in- 
crease maternal investment, such as prolonging 
lactation and huddling periods to avoid infant 
exposure during the cold season (64-66). There- 
fore, we hypothesized that in response to cold 
temperatures, more efficient oxytocin and do- 
pamine pathways were selected for in the odd- 
nosed monkeys, resulting in enhanced maternal 
care and infant survival. Furthermore, higher 
levels of oxytocin and dopamine also pro- 
mote interindividual affiliation, mitigate inter- 
group conflict, and increase social bonding 
(67, 68). This could have facilitated increased 
cooperation and neighbor-male tolerance (69, 70) 
and thus may have favored social aggregation 
from independent one-male, multifemale groups 
to multilevel societies. 


Rapid evolution in the oxytocin and dopamine 
pathways is related to social aggregation 


To understand the adaptive changes in the 
oxytocin and dopamine pathways, we compared 
all 104 oxytocin-related and 96 dopamine- 
related orthologous genes (table S31) among 
snub-nosed monkeys, which represent a multi- 
level society; ancestral odd-nosed monkeys, 
which represent a semi-multilevel society; and 
classical langurs, which form independent one- 
male, multifemale units. By using PAML (37), 
hypothesis testing using phylogenies (77), and 
specific amino acid change (72), our results 
show that 22 (21.2%) genes in the oxytocin 
pathway and 20 (20.8%) genes in the dopa- 
mine pathway were selected in species that 
form multilevel societies. This is significantly 


higher than the 12 (11.5%) and 10 (10.4%) genes 
selected in the same pathways of species that 
form semi-multilevel societies (SM section 5.5 
and tables S32 and S33), as well as signifi- 
cantly higher than the four (3.8%) and three 
(3.1%) genes selected in Asian classical langurs 
that form independent one-male, multifemale 
groups (chi-square test; Fig. 5E). This pattern 
of genome-wide change in neuron structures 
to signal transmission across different clades 
is consistent with differences in the level of 
social aggregation from one-male, multifemale 
units to multilevel societies. 

In the case of the ancestral odd-nosed mon- 
keys that initially formed semi-multilevel so- 
cieties, a suite of gene changes was identified 
in the oxytocin pathway (Fig. 5B and table S32). 
These include RYR3, which showed specific 
mutations that affect oxytocin release, and 
ALOX12, which was positively selected and 
regulates downstream milk secretion (Fig. 4E 
and table S31). In the dopamine pathway, spe- 
cific variations in genes and noncoding regu- 
latory regions were identified (Fig. 5, A, C, and 
D), for example, genes that affect dopamine- 
regulation processes, such as PINK, which is 
responsible for dopamine synthesis; SLCI8AI, 
which influences dopamine transport; and 
GRIA3, which functions in reward behavior 
(fig. S23). In particular, we found that dopamine 
receptor genes DRDI and DRD3 were rap- 
idly evolving genes in the ancestral odd-nosed 
monkey clade (table S32). These G protein- 
coupled receptors (GPCRs), which are precise 
targets located in the cell membrane, play an 
important role in binding extracellular dopa- 
mine and transmit signals for intracellular 
downstream responses (Fig. 5D). Taken to- 
gether, these findings suggest that the oxytocin 
and dopamine pathways evolved rapidly in an- 
cestral odd-nosed monkeys, presumably in re- 
sponse to the initial aggregation required to 
form a semi-multilevel society. 

Based on these findings, we examined the 
specific amino acid changes in oxytocin and 
dopamine pathway genes in each of the ex- 
tant species of odd-nosed monkeys after their 
radiation from their common ancestor. A total 
of 22, 20, 10, and 6 genes in the oxytocin path- 
way and 20, 15, 9, and 4 genes in the dopamine 
pathway were identified in snub-nosed mon- 
keys, which represent multilevel societies; doucs 
and proboscis monkeys, which represent semi- 
multilevel societies; and pig-tailed simakobus, 
which represent one-male, multifemale units, 
respectively (SM section 5.5 and tables S31 to 
$35). For example, DRD5, which encodes a 
dopamine receptor, had specific mutations in 
extant multilevel societies and semi-multilevel 
societies species (Fig. 4D) that were not pres- 
ent in one-male, multifemale unit species. In 
particular, specific amino acid changes in genes 
CD38 and RYRI, which are associated with oxy- 
tocin downstream regulation, and the coding 
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region of the gene OXTR were present in 
multilevel society and semi-multilevel society 
species but were absent in one-male, multi- 
female unit species (Fig. 5B and tables S31 to 
$35). By contrast, GCHI and PRKCB, which 
are associated with dopamine synthesis and 
downstream response regulation, were se- 
lected in multilevel society species but not in 
semi-multilevel society species or one-male, 
multifemale unit species (Figs. 4E and 5, C and 
D; and table S32). Furthermore, the multilevel 
society species exhibited a shared threonine- 
to-serine mutation in DRDI, which encodes a 
dopamine receptor, in contrast to semi-multilevel 
society species or one-male, multifemale unit 
species (Fig. 4E), which do not. These genetic 
changes in the oxytocin and dopamine path- 
ways reveal changing patterns in the neurohor- 
monal regulation system that appear related to 
different levels of affiliation behavior (Fig. 2B). 
Considering the importance of receptors in 
intercellular signal transduction and intracel- 
lular downstream responses, we used GPCR-I- 
TASSER to construct three-dimensional models 
to simulate protein expression in four oxytocin 
and dopamine receptors in snub-nosed mon- 
keys, doucs, and Francois’ langurs, represent- 
ing a multilevel society, a semi-multilevel society, 
and a one-male, multifemale unit species, re- 
spectively (73) (Fig. 5F and fig. S22). The re- 
sults indicate that a specific amino acid change 
of valine to isoleucine, located in the sixth 
transmembrane region of DRDI, was present 
in the odd-nosed monkey clade, which repre- 
sents the ancestral aggregation from one-male, 
multifemale units to semi-multilevel societies 
(Fig. 5F). This mutation site was simulated to 
lie close to the binding pocket and thus may 
affect dopamine binding activity in the odd- 
nosed monkey clade; this mutation is not pres- 
ent in DRDI in Asian classical langurs and 
African colobus monkeys of the subfamily 
Colobinae, which represent independent one- 
male, multifemale units (Fig. 5F). In addition, 
the specific amino acid change of threonine to 
serine in DRDI/ in snub-nosed monkey species 
that live in multilevel societies was modeled to 
locate the conserved topological domain. This 
domain, which is located in the C-terminal 
domain of the GPCR protein (fig. S22), plays 
an important function in G protein coupling 
and activation (74) and thus may enhance intra- 
cellular G protein binding in these species com- 
pared with other species of colobines (fig. S22). 
To confirm the functional expression of these 
receptors, we conducted cellular experiments 
that synthesized each sequence of DRDI and 
OXTR of the corresponding species, and these 
were then transferred in vitro into human em- 
bryonic kidney 293 (HEK293) cells. The results 
showed that the expressed DRD1 had higher 
binding efficiency in multilevel society species 
than in semi-multilevel society species (P < 0.05; 
Fig. 5H). Furthermore, the binding efficiency of 
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the expressed DRD1 in species that exhibit 
either of these types of social organization 
was significantly higher than that in species 
with an independent one-male, multifemale 
unit social organization (P < 0.05; Fig. 5H). 
This finding is consistent with the pattern 
shown by three-dimensional modeling. In addi- 
tion, OXTR had a significantly higher binding 
efficiency in multilevel society and semi- 
multilevel society species than in independent 
one-male, multifemale unit species (P < 0.05; 
Fig. 5G). These results demonstrate a correla- 
tion between species with increased social ag- 
gregation and increased binding efficiency of 
their dopamine and oxytocin receptors. 

Overall, our results show integrated differ- 
ences that involve multiple genetic changes 
across various biological processes genome 
wide, which are linked to neurohormonal reg- 
ulation, including the oxytocin and dopamine 
pathways. These changes are consistent with 
differences in social organization and intermale 
tolerance in Asian colobine species and may 
underpin their ability to form large, stable, and 
cohesive groups. 


Increased behavioral affiliation is related to 
oxytocin and dopamine regulation 


To verify changes in social behavior in re- 
sponse to different levels of neurohormonal 
regulation, including the oxytocin and dopa- 
mine expression, we compared the strength of 
social affiliation among species represented by 
each of the three types of social organization. 
We constructed a behavioral dataset related to 
social affiliation that involved 17 behavioral 
categories collected from information reported 
in 45 extant species of Asian colobines (data S1, 
$2, S6, and S7). Analysis of variance (ANOVA) 
tests revealed that neighbor-male tolerance; 
interactions between one-male, multifemale 
units; and time spent in social grooming as a 
percentage of daily time budgets were signif- 
icantly higher in multilevel society species than 
in semi-multilevel society species and inde- 
pendent one-male, multifemale unit species 
(Fig. 51). This is consistent with the expression 
results of in vitro experiments, which support 
our contention that genomic changes in the 
regulation of neurohormonal systems, includ- 
ing the oxytocin and dopamine pathways, may 
promote affiliative behaviors that are more 
pronounced in cold-adapted species. 


Conclusion 


In this study, we found that Asian colobines 
that inhabit colder environments tend to live 
in larger, more complex groups. By construct- 
ing a socioecological-genomic framework, we 
found that instead of evidence of direct adap- 
tion to current environmental conditions, 
historical patterns of dispersal, phylogenetic 
species radiations, and adaptations to ancient 
environmental conditions played a more crit- 


ical role in the social evolution of Asian colo- 
bines. Cold adaptations during ancient glacial 
events in ancestral odd-nosed monkeys ap- 
pear to have favored the selection of the neuro- 
hormonal regulation system, from neuron 
structure to signal transmission, which in- 
cludes the dopamine and oxytocin pathways. 
These changes in the dopamine and oxytocin 
pathways appear to function in strengthening 
social bonds, in facilitating male-male toler- 
ance, and in shaping social affiliation. This 
process played an important role in promoting 
social aggregation from small, independent 
one-male groups into larger multilevel socie- 
ties. Our study identifies, for the first time, a 
genomically regulated adaptation that is linked 
to stepwise social evolution in primates and 
offers new insights into the mechanisms that 
underpin diverse behavioral evolution across a 
range of animal taxa. 


Materials and methods summary 
Sequencing and assembly 


We sequenced seven Asian colobine genomes 
by using four technologies, including long-read 
sequencing of Oxford Nanopore or PacBio 
SMART, paired-end sequencing, and high- 
throughput chromosome conformation cap- 
ture (Hi-C). Different de novo assemblies were 
performed using FALCON v.0.4.0 (75), wtdbg2 
v.2.4.1 (76), and SOAPdenovo2 v. 1.0 (77) ac- 
cording to the sequencing strategy used. Ge- 
nomes with Hi-C reads were further scaffolded 
to chromosome based on LACHESIS (78) or 
3D-DNA (79). 


Dataset resources 


We compiled the datasets of social, behavioral, 
and ecological traits of Asian colobines using 
published information (SM section 2), which 
include (i) social organization, such as group 
size and composition (data S1); (ii) mating 
system (data S1); (iii) social structure, which is 
defined as social interactions and communi- 
cation, including the proportion of the activity 
budget devoted to social grooming (data S6); 
(iv) ecological (bioclimatic) variables based on 
occurrence location coordinates (data S2); and 
(v) paleoecological data based on the fossil 
record, paleoclimate, and paleogeography across 
Asia (SM section 2.2). 


Ecological analyses 


Ecological niche modeling was conducted using 
Maxent to reconstruct species distribution in 
the present climate and under paleoclimates. 
Principal components analysis was used to ex- 
tract two main characters from 19 climate 
variables for 2189 species occurrences in the 
R package Multivariate Exploratory Data Analy- 
sis and Data Mining with FactoMineR v.3.6.1 
(80). Geographic information was processed in 
ArcGIS (ArcGIS version 10.6, Environmental Sys- 
tems Research Institutes, Inc., Redlands, CA, USA). 
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Reconstruction of phylogenomic relationships 
One-to-one orthologs for phylogenomic rela- 
tionship reconstruction were generated with 
OrthoFinder v.2.0.9 (87). Then, these orthol- 
ogous genes were used to generate two de- 
pendent datasets, including a concatenated 
coding sequence alignment and the fourfold 
degenerate sites. For each dataset, a tree was 
constructed with the concatenation method of 
IQ-TREE v.1.6.12 (82) and coalescent method 
of Astral v.2.0 (83), respectively. The diver- 
gence time was estimated using MCMCtree 
v.4.5 (37). 


Phylogenetic analyses 


Pagel’s i) was estimated using the R package 
GEIGER v.2.0.6 (40). The Phylo.D was esti- 
mated using R package CAPER v.1.0.1 (50), and 
the probability of the estimated D resulting 
from the Brownian phylogenetic structure was 
marked as Pp prownian. BayesTraits v.3.0.2 (32) 
was used to infer the social system state for 
each ancestral node, which was determined by 
calculating the ancestral state posterior prob- 
ability. A random-walk Markov chain Monte 
Carlo procedure in BayesTraits v.3.0.2 was used 
to infer the correlated evolution between bio- 
climatic variables and group size. 


Reconstruction of ancestral geographic ranges 


We reconstructed the ancestral range through 
multiple biogeographical models (e.g., DIVA, 
DEC, or BayAreaLike) using Reconstruct An- 
cestral State in Phylogenies 4.2 (84). The best 
model generated was used to reconstruct the 
range in each ancestral node. 


Demographic history reconstruction 


Demographic history was inferred using PSMC 
v.0.6.5 (85) under a hidden Markov model. 
Paired-end Illumina sequences were aligned 
to the repeat-masked genome assembly of each 
species using the Burrows-Wheeler Alignment 
tool v.0.7.17-r1188 (86). Then, consensus sequen- 
ces were generated using Sequence Alignment/ 
Map format tools v.1.3.1 (87). Each PSMC test 
was examined with 100 bootstrap replicates. 


Comparative genomics analyses 


Divergent (fast-evolving) UCNEs were iden- 
tified by using the R package GEIGER v.2.0.6 
(40). PhyloFit v1.4 (88) and phastConsvl.4 (39) 
were used to infer CNEs. Orthologous genes 
were constructed by using LAST v.last982 (89). 
The pairwise synteny alignment analysis was 
conducted for Asian colobine species as well as 
outgroups, with the human genome serving as 
the reference. Then, the corresponding orthol- 
ogous sequences were extracted based on the 
eff file of the human genome. The Gene On- 
tology and KEGG pathway enrichment analy- 
ses were conducted using KOBAS v.3.0 (47). 
Selection pressure tests were implemented by 
both branch-site models and branch models 
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using PAML v.4.9 (37) through a likelihood 
ratio test and strict filter criterion. An episodic 
positive selection signal was detected using 
the mixed effects model of evolution (90) im- 
plemented in Hypothesis Testing using Phy- 
logenies v.2.5.25 (71). Rapidly evolving Gene 
Ontology terms were identified following the 
evolutionary model and method proposed by 
Wang et al. (38). Specific mutations were iden- 
tified following the specific amino acid change 
pipeline from Chen et al. (72) and were further 
examined if they were located in functional 
regions using the protein families database 
Pfam v.1.6 (91). Genome-wide associations with 
social evolution were explored using PGLS re- 
gression analyses in the R package Compara- 
tive Analysis of Phylogenetics and Evolution in 
R (CAPER) v.1.0.1 (50). 


Protein structure modeling 


The 3D protein structure of the functional 
region was simulated by GPCR-I-TASSER 
(73) and then visualized using PyMOL (the 
PyMOL molecular graphics system, version 
2.0, Schrddinger, LLC). The binding cavity 
was explored with the docking simulations 
in Dock vina (92). 


In vitro expression assay 


For in vitro experiments, orthologous sequen- 
ces were synthesized by General Biosystems 
Corporation Limited (Anhui, China). All genes 
were cloned into pcDNA3.1-V5-His vector sepa- 
rately and expressed in HEK293 cells. After 
48 hours, the supernatant was removed, the 
cells were rinsed twice with phosphate-buffered 
saline (PBS), and then multiple solutions were 
added for an enzyme-linked immunosorbent 
assay experiment. Absorbance measurements 
were conducted at 370 nm within 30 min. Re- 
sults were analyzed using GraphPad Prism. Sta- 
tistical significance was set at <0.05, mean + SD. 


Measurement of receptor activity 


For DRD1, luciferase activities were determined 
using luciferase assay kits (Beyotime, Shanghai, 
China). In the case of OXTR, fluorescence was 
measured using microplate reader SYNERGY 
H1 (BioTek Instruments). HEK293 cells trans- 
fected with pcDNA were used as a control in all 
luciferase experiments. 
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Genome-wide coancestry reveals details of ancient 
and recent male-driven reticulation in baboons 


Erik F. Sorensen et al. 


INTRODUCTION: As a widespread but compar- 
atively young clade of six parapatric species, 
the baboons (Papio sp.) exemplify a frequently 
observed pattern of mammalian diversity. In 
particular, they provide analogs for the popula- 
tion structure of the multibranched prehuman 
lineage that occupied a similar geographic 
range before the hegemony of “modern” hu- 
mans, Homo sapiens. Despite phenotypic and 
genetic differences, interspecies hybridization 
has been described between baboons at sev- 
eral locations, and population relationships 
based on mitochondrial DNA (mtDNA) do not 
correspond with relationships based on pheno- 
type. These previous studies captured the broad 
outlines of baboon population genetic structure 
and evolutionary history but necessarily used 
data that were limited in genomic and geograph- 
ical coverage and therefore could not adequately 
document inter- and intrapopulation variation. 
In this study, we analyzed whole-genome se- 
quences of 225 baboons representing all six 
species and 19 geographic sites, with 18 local 
populations represented by multiple individuals. 


RATIONALE: Recent studies have identified sev- 
eral mammalian species groups in which ge- 
netically distinct lineages have hybridized to 
generate complex reticulate phylogenies. Ba- 
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Lake Manyara 


West 


KINDA BABOONS 


i Tarangire 


boons provide a valuable context for studying 
processes generating such population and phy- 
logenetic complexity because extant parapatric 
species form hybrid zones in several regions of 
Africa, allowing for direct observation of on- 
going introgression. Furthermore, prior studies 
of nuclear and mtDNA and phenotypic diversity 
have demonstrated gene flow among differ- 
entiated lineages but were unable to develop 
the detailed picture of process and history that 
is now possible using whole-genome sequences 
and modern computational methods. To ad- 
dress these questions, we designed a study that 
would provide a more fine-grained picture of 
recent and ancient genetic reticulation by 
comparing phenotypes and autosomal, X and 
Y chromosomal, and mtDNA sequences, along 
with polymorphic insertions of repetitive ele- 
ments across multiple baboon populations. 


RESULTS: Using deep whole-genome sequence 
data from 225 baboons representing multiple 
populations, we identified several previously 
unknown geographic sites of gene flow be- 
tween genetically distinct populations. We re- 
port that yellow baboons (P. cynocephalus) 
from western Tanzania are the first nonhuman 
primate found to have received genetic input 
from three distinct lineages. We compared the 


YELLOW BABOONS 


ancestry shared among individuals, estim eee 
separately from the X chromosome and ave 
somes, to distinguish shared ancestry due 
to ancestral population relationships from 
coancestry as a result of recent male-biased 
immigration and gene flow. This reveals di- 
rectionality and sex bias of recent gene flow 
in several locations. Analyses of population 
differences within species quantified dif- 
ferent degrees of interspecies introgression 
among populations with an essentially iden- 
tical phenotype. 


CONCLUSION: The population genetic structure 
and history of introgression among baboon 
lineages are even more complex than predicted 
from observed phenotypic diversity and prior 
studies of limited genetic data. Single popula- 
tions can carry genetic contributions from more 
than two ancestral sources. Populations that 
appear homogeneous on the basis of observ- 
able phenotype can display different levels of 
interspecies introgression. The evolutionary dy- 
namics and current structure of baboon popu- 
lation diversity indicate that other mammals 
displaying differentiated and geographically 
separate species may also have more-complex 
histories than anticipated. This may also be 
true for the morphologically defined hominin 
taxa from the past 4 million years. 
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meeting in East Africa 


—————— 
Three species contributed to western yellow 
baboons. Migration of yellow and olive baboon males 
into the Kinda baboon range produced the population 
now considered the western yellow baboons. 


———— 
Ancient male-biased migration of yellow baboons 
into the range of the northern baboon clade resulted 
ina northern yellow baboon population sharing the 
northern baboon mtDNA. 


a 
Recent and ongoing admixture between species 
and populations 


Ancient and recent admixture among baboons: Complex population substructure and reticulation revealed by whole-genome sequencing. Pie charts 
represent recent ancestry of East African populations, with species contributions colored as in the inset map. Patterns of mixed ancestry differ substantially, even among 
conspecific populations. This suggests a complex history of recurrent interpopulational gene flow, driven predominantly by male migration. Comparably complex 
admixture probably also occurred among early hominins. 
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Genome-wide coancestry reveals details of ancient 
and recent male-driven reticulation in baboons 


Erik F. Sorensen’+, R. Alan Harris*+, Liye Zhang*{, Muthuswamy Raveendran*+, Lukas F. K. Kuderna**, 
Jerilyn A. Walker®, Jessica M. Storer’, Martin Kuhlwilm*®”, Claudia Fontsere*, Lakshmi Seshadri’, 
Christina M. Bergey’°, Andrew S. Burrell”, Juraj Bergman’””, Jane E. Phillips-Conroy*“, 

Fekadu Shiferaw™, Kenneth L. Chiou’®””, Idrissa S. Chuma”®, Julius D. Keyyu’’, Julia Fischer?°7+22, 
Marie-Claude Gingras?, Sejal Salvi, Harshavardhan Doddapaneni?, Mikkel H. Schierup’, Mark A. Batzer®, 
Clifford J. Jolly”, Sascha Knauf, Dietmar Zinner?°12, Kyle K.-H. Farh>*, 

Tomas Marques-Bonet**7>6*, Kasper Munch", Christian Roos*”*, Jeffrey Rogers* 


Baboons (genus Papio) are a morphologically and behaviorally diverse clade of catarrhine monkeys 
that have experienced hybridization between phenotypically and genetically distinct phylogenetic species. We 
used high-coverage whole-genome sequences from 225 wild baboons representing 19 geographic localities to 
investigate population genomics and interspecies gene flow. Our analyses provide an expanded picture of 
evolutionary reticulation among species and reveal patterns of population structure within and among 
species, including differential admixture among conspecific populations. We describe the first example 
of a baboon population with a genetic composition that is derived from three distinct lineages. The 
results reveal processes, both ancient and recent, that produced the observed mismatch between 
phylogenetic relationships based on matrilineal, patrilineal, and biparental inheritance. We also identified 
several candidate genes that may contribute to species-specific phenotypes. 


ur understanding of the evolutionary pro- 
cesses involved in the origin of biolog- 
ical diversity has changed considerably 
over the past two decades. Genetic analy- 
ses have demonstrated that hybridization 
and interspecies gene flow between closely 
related mammalian species occur more often 
than previously assumed (J, 2). Traditional 
studies of natural hybridization among pop- 
ulations and species have relied on pheno- 
typic variation and a few informative genetic 
markers (3, 4). However, access to large-scale 
genomic datasets now allows more extensive 
analyses (5-7) demonstrating that, in some 
cases, complex reticulations rather than di- 
chotomously branching phylogenetic trees more 
accurately represent evolutionary histories. 
Among primates, humans included, the num- 
ber of genera found to exhibit complex his- 
tories of interspecific reticulation has recently 
grown markedly (2, 8-12). Baboons (genus 


Papio) have long been recognized as a prime 
example of interspecies gene flow, with sev- 
eral hybrid zones between the six currently 
recognized parapatric species [Guinea baboons 
(P. papio), hamadryas baboons (P. hamadryas), 
olive baboons (P. anubis), yellow baboons 
(P. cynocephalus), Kinda baboons (P. kindae), 
and chacma baboons (P. ursinus); Fig. 1; for 
the rationale behind the classification of these 
major forms as species rather than subspecies, 
see (13)] (14-17). Previous analyses have iden- 
tified substantial discrepancies in species-level 
phylogenies inferred using information from 
nuclear DNA, mitochondrial DNA (mtDNA), 
and phenotypes, indicating para- and poly- 
phyletic relationships and suggesting a com- 
plex history of differentiation and admixture 
(18-21). Recent comparisons of whole-genome 
sequence (WGS) data across Papio species il- 
lustrated the extent of genetic exchange be- 
tween phenotypically distinct species (22-25). 


These studies were, however, restricted to one 
or two populations per species and therefore 
unable to analyze wider geographic patterns 
of genetic diversity or compare the local effects 
of interspecific contact. 

This study provides a detailed WGS-based 
analysis of coancestry and genomic exchange 
across all six baboon species, including multi- 
ple populations within olive and yellow baboons. 
We generated deep [>30~x; table S1 (23)] WGS 
data from 225 wild baboons representing 19 
localities (Fig. 1 and table S2), describing 
variation within and among localities for 
autosomes, X and Y chromosomes, mtDNA, 
and other genetic features such as insertions 
of Alu repeats and long interspersed elements 
(LINEs). In addition to analyzing population 
structure using autosomal single-nucleotide var- 
iants (SNVs) and repetitive elements, we com- 
pared coancestry inferred from autosomal and 
X chromosomal data to reveal sex-biased ef- 
fects on genetic population structure. Our results 
provide the most extensive analysis of genetic 
diversity in baboons to date and reveal processes, 
both recent and in the distant past, that resulted 
in the discrepancies documented among the 
phylogenetic relationships based on matri- 
lineal, patrilineal, and biparental inheritance. 
The evidence indicates the radiation that pro- 
duced the six extant species began more than 
1 million years ago. The lineages that diverged 
around that time have since experienced exten- *‘ 
sive admixture, as reflected in their current gene- 
tic composition. We suggest that these findings 
inform predictions for similar systems such as 
hominin and early human evolution, for which 
baboons have long been recognized as a model 
(26-29). 


Results 


WGS analysis across multiple populations of 
baboons provides a fine-grained picture of < 
present-day population structure and the evo- ‘ 
lutionary history that generated it. Results of 
this analysis also document additional locations 
of ongoing admixture among genetically dis- 
tinct lineages. Our analyses of SNVs strongly 
support the existence of differentiated clades 
including the six recognized species, despite 
well-known hybrid zones between parapatric 
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Fig. 1. Distribution of the six baboon species and sampling sites. Species distributions are modified from (20). The inset map shows sampling sites in Tanzania. 
Numbers of samples per species are given in parentheses. [Illustrations of male baboons by Stephen Nash, used with permission] 


species. The initial divergence of evolution- 
ary lineages separates the three northern spe- 
cies (hamadryas, olive, and Guinea baboons) 
from the three southern species (Kinda, yel- 
low, and chacma baboons). Analyses of pop- 
ulation structure (Fig. 2, A to C, and figs. S1 
to S4) and phylogenomic maximum-likelihood 
(ML) trees using autosomal, X and Y chromo- 
somal, and mtDNA data (figs. S5 to S8) are 
consistent with the initial north-south split 
and with greater overall divergence among 
southern than northern baboons [see also (23)]. 
Principal components analyses (PCAs) and ML 
trees of autosomal and X chromosomal data 
separate the western Tanzanian yellow baboons 
located at Mahale and Katavi into their own 
cluster distinct from eastern Tanzanian yellow 
baboons from Mikumi, Selous, Ruaha, and 
Udzungwa as well as from Kinda baboons. 
However, the Y chromosomal phylogenies, in- 
cluding one based on Alw insertions (fig. S9), 
show six main clusters largely corresponding to 
the six species and place most western yellow 
baboons with Kinda baboons. Other western 
yellow baboons cluster in that analysis with 
eastern yellow and one olive baboon, provid- 
ing a clear example of admixture processes 
not revealed by the whole-genome phylogeny. 

Across the genome of each individual, we 
identified the most recent coancestry among 
all other sampled individuals [using Chromo- 
Painter (30)]. The corresponding first two 
principal components (Fig. 2C) show exten- 
sive variation among yellow baboons and 
confirm the primary north-south split. This 
split is also apparent in the clustering using 
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fineSTRUCTURE (30) (Fig. 2B). ML trees for 
autosomes and X and Y chromosomes (figs. 
S5 to S7) all support the conclusions reached 
by PCA, with two individuals falling outside 
their expected species clades [samples PD0266 
and PD0662, also anomalous in the PCAs; figs. 
S1, S2, S10, and S11 (13)]. As discussed below, 
the Y chromosomal phylogeny places Kinda 
baboons basal to all others (fig. $7). 
Unsupervised cluster algorithms group indi- 
viduals largely by species (See ADMIXTURE 
analysis; Fig. 2D and fig. S12) with K = 7 as 
the preferred number of clusters. However, 
in species for which we sampled more than 
one population (olive and yellow baboons), we 
find local genetic differences and evidence for 
a complex evolutionary history (detailed dis- 
cussion below). These results are also sup- 
ported by an analysis of LINE-1 (L1) insertions 
(fig. S13), an independent class of genetic 
marker that is less prone to parallel mutations. 
The pelage phenotypes on which taxonomy 
was traditionally based are generally very con- 
sistent within species over wide geographic 
ranges (31). Yet we find high genomic varia- 
tion within and among conspecific popula- 
tions. Heterozygosity ranges from 0.0006 to 
0.0026 (average: 0.0018) per base pair across 
the six species, and from 0.0006 to 0.0029 
across the 19 localities, with the lowest values 
in Guinea baboons (table S3 and figs. S14 to 
S17). The coancestry matrix and its PCA (Fig. 2, 
B and C) differentiates the various sampling 
localities and is therefore consistent with the 
ADMIXTUBRE analysis (Fig. 2D), showing that 
the sampled populations within both yellow 
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and olive baboons can be distinguished ge- 
netically. The yellow baboons in Mikumi (Fig. 
2B, box H) share pelage and morphological 
phenotypes with those in Ruaha despite being 
genetically distinct. Western yellow baboons 
from Mahale and Katavi (Fig. 2B, box F) ex- 
hibit phenotypic traits (Somewhat smaller body 
size than Mikumi baboons, especially in terms 
of cranial metrics; aspects of coat color, with 
some individuals having pink skin around the 
eyes and sporadic occurrence of white-furred 
infants) in which they resemble Kinda baboons 
(32). The coancestry matrix (Fig. 2B) further 
shows that yellow baboons from Mahale and 
Katavi (box F) exhibit greater genetic similar- 
ity with Kinda (box E) and chacma baboons 
(box G) than with their supposed conspe- 
cifics from eastern Tanzania (box H). Simi- 
larly, all olive baboons (except for those from 
Tarangire) share a very consistent pelage and 
external phenotype. However, ADMIXTURE 
(Fig. 2D) and ChromoPainter (Fig. 2B) analy- 
ses identify clear evidence of genetic differences 
between the Ethiopian Gog olive baboons and 
the Tanzanian olive baboons of Lake Manyara 
and Ngorongoro. Furthermore, the Serengeti 
population is more similar genetically to both 
the Gombe and Aberdare populations than to 
the Ngorongoro or Lake Manyara populations, 
which are geographically much closer. 

We used the SNV data to reconstruct the 
history of population size for each baboon 
locality (Fig. 3A and figs. S18 to S21). The 
estimated effective population sizes (V.) were 
all essentially the same and on the order of 
100,000 until about 1.0 million to 1.2 million 
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years ago, which is consistent with the prior 
dating of the initial north-south divergence 
(23). At the separation, the N, of northern 
populations fell below that of the southern 
populations, supporting the idea that the genus 
arose in southern Africa, and a daughter pop- 
ulation from this basal stock spread to the 
north, then to the west, losing genetic diver- 
sity in serial founding events. The suggestion 
that Guinea baboons represent the descend- 
ants of those groups that were at the leading 
edge of that dispersal for the longest distance 
and time (33) is supported by the lower het- 
erozygosity in that sample relative to all other 
baboon species (table $3). Also, whole-genome 
Alu and L1 insertion-based phylogenies place 
western yellow baboons with Kinda baboons, 
whereas Guinea baboons are basal among ba- 
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Fig. 2. Population structure and coancestry of the six baboon species. 

(A) PCA of autosomal SNVs. (B) ChromoPainter coancestry matrix with 
fineSTRUCTURE dendrogram. Each row in the coancestry matrix represents 

an individual and illustrates how its most recent common ancestry is distributed 
across all other sampled individuals. The ordering of individuals is the same 
for rows and columns. The row color labels are the same as in (A) and 
correspond to clusters shown for eight populations labeled with boxes: 


: 
oO 


boons, and hamadryas baboons are the sister 
taxon to olive and southern baboons (figs. 
$22 and S23). These findings may result from 
Guinea baboons and, to a lesser extent, ham- 
adryas baboons losing polymorphic derived 
Alu and L1 insertions through drift as they 
dispersed north from the southern geographic 
origin (34). 

Earlier studies provided clear evidence for 
hybridization and gene flow across the con- 
tact zones between pairs of parapatric spe- 
cies (15-17, 24, 25, 35). In this study, we present 
evidence for additional ancient and recent 
arenas for gene flow between species pairs. 
Species tree reconstruction [ASTRAL (36)] 
using window-based ML trees (50- and 500-kb 
window size) produced inconsistent branch- 
ing patterns among datasets, and only 58 to 
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P.ursinus EEE DendroPark 


A, Gog olive (Ethiopia); B, hamadryas; C, Guinea; D, southern olive (Kenya and 
Tanzania); E, Kinda; F, western yellow; G, chacma; H, eastern yellow; X, olive 
coancestry in western yellows suggesting admixture (see alternate fineSTRUCTURE 
figure, fig. S4). Color labels below the dendrogram represent the 14 groups 
named in the figure legend. (©) PCA of the coancestry matrix. (D) ADMIXTURE 
plot with the preferred grouping of baboons into seven clusters (K = 7; for 

K = 2 to 10, see fig. S12). 


70% of gene trees fit the species tree at the 
quartet level (figs. S24 and S25). Both in- 
complete lineage sorting (ILS) and gene flow 
are likely contributing to this discordance, 
which is expected to be larger for smaller win- 
dows. In addition, a qualitative visualization of 
these trees (figs. S24: and S25) shows a network- 
like pattern, again indicating complexity. There 
is greater shared genetic drift (measured by /3 
outgroup statistics) among eastern yellow 
baboon localities (Udzungwa, Selous, Mikumi, 
Ruaha), whereas western yellow baboons tend 
to cluster with Kinda baboons (fig. S26). In 
admixture graphs (Fig. 3B), Kinda baboons 
are, similarly to the description in (23), rep- 
resented as a fusion product of populations 
from southern and ancestral northern clades, 
whereas the western yellow baboons share 
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Fig. 3. Population history and complex reticulation between baboon 
populations. (A) MSMC2 plots using a mutation rate of 0.9 x 10°° and a 
generation time of 11 years (23). (B) Admixture graph of the populations used 
in this study, based on 48,730,011 single-nucleotide variants with data 

for all individuals, and a predefined number of two admixture events. Numbers 
on solid branches correspond to the estimated drift in f2 units of squared 
frequency difference; labels on dotted edges give admixture proportions. 


ancestry with both Kinda and olive baboons. 
More complex graphs (tables $4 and S5 and 
figs. S27 to S29) might be supported, but they 
failed to give replicable results, likely owing to 
complex reticulation and multiple gene flow 
events at different times and between differ- 
ent local populations, which now obscure the 
processes involved. 

Taken as a whole, this expanded dataset 
does not support the previous suggestion that 
Kinda baboons result from a recent fusion 
event (23) as shown in Fig. 3B. In PCA plots 
using genome-wide SNVs, Kinda baboons do 
not fall intermediate between northern and 
southern clades but in fact are quite distinct 
(Fig. 2A and figs. S1 and S2). Some ML trees 
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(i.e., Y chromosome data; fig. S7) place Kinda 
baboons as a sister clade to all other baboons, 
whereas other trees (autosomes and X chro- 
mosome data; figs. S5 and S6) lump them to- 
gether with yellow and chacma baboons into 
the southern clade. These results are more con- 
sistent with the idea that Kinda baboons show 
substantial genetic similarity to both northern 
and southern clade baboons because they are 
basal and phenotypically resemble the an- 
cestral form from which all extant species are 
derived. Fossil evidence suggests a southern 
African origin for baboons (34), and the mtDNA 
haplotypes of Kinda and western yellow ba- 
boons (Fig. 4 and fig. S8) (21) suggest that 
their range in tropical southern Africa may 
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(C) Globetrotter analysis of the eight major regional populations. The pie 
chart for each cluster shows ancestry contributions from other clusters. . 
Expanded wedges represent ancestry that can be attributed to recent admixture , 
(<56 generations, bootstrap P < 0.05). (D) Same as (C), but for 14 populations 
separating each major sampling location (here, expanded wedges represent 

ancestry that can be attributed to admixture more recent than 95 generations, 


include the area of origin of both northern 
and southern primary branches. Broader as- 
pects of Y chromosome data also do not sup- 
port Kinda baboons as a fusion product; Kinda 
baboon Y haplotypes are found in western 
yellow baboons but not in olive baboons, and 
no olive baboon mtDNA has been observed 
in any Kinda baboon to date. Finally, Kinda 
baboons share more polymorphic Alw inser- 
tions with geladas than do other Papio species, 
possibly the result of a period of coexistence 
and hybridization between their ancestors (37). 

We analyzed the genetic relationships among 
the eight major regional baboon populations 
that constitute our samples: the four single- 
locality populations of chacma, Kinda, hamadryas, 
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Fig. 4. Geographic distribution 
of mtDNA clades and mtDNA 
phylogeny. (A) Distribution 
ranges of baboon species and the 
four main mtDNA clades (south, 
southeast, northeast, northwest, 
dashed lines) including major 
mitochondrial lineages (A to R). 
(B) Phylogeny based on complete 
mtDNA genomes (see also 

fig. S8). Clade designation follows 
(20, 21), and asterisks indicate 
lineages from which mtDNA 
genomes have been generated in 
this study. For identical haplo- 
types, see table S7. 


species 
EOIP. papio 
'@/P. anubis 


@)P. kindae 
® P ursinus 


and Guinea baboons and two groups each of 
yellow (western and eastern) and olive (Gog 
and southern) baboons. By modeling the re- 
cent ancestry along the chromosomes of in- 
dividual baboons [Globetrotter (38)], we can 
represent each group as a mixture of recent 
ancestry with the remaining seven groups 
(Fig. 3C). In most of the groups, we can iden- 
tify a contribution from recent admixture 
events (the oldest identifiable event estimated 
at 56 generations; table S6) separate from 
contributions of older admixture and reten- 
tion of ancestral polymorphism (bootstrap P < 
0.01 unless otherwise noted). In Fig. 3, C and 
D, we distinguish the recent admixture from 
more-ancient shared ancestry by showing the 
recent admixture estimates as expanded (ex- 
ploded) wedges. 

We identified a large amount of shared an- 
cestry between southern olive and eastern yel- 
low baboons not concordant with the overall 
phylogeny (Fig. 3C). This is also expressed in 
the coancestry matrix (Fig. 2B, box X) and is 
additional evidence of persistent admixture 
between both species (15, 17, 22, 25). Further- 
more, western yellow baboons from Mahale 
and Katavi share substantial ancestry with 
eastern yellow, Kinda, and southern olive ba- 
boons. This cannot be explained as a retention 
of ancient shared variation present before the 
origin of the six major branches, because there 
is no equivalent sharing with chacma, hama- 
dryas, or Guinea baboons. This finding is, 
therefore, the first evidence that a single pop- 
ulation (western yellow baboons) contains 
measurable admixture contributions from more 
than two distinct lineages. Comparing the an- 
cestry of recently admixing populations (ex- 
panded wedges in Fig. 3C) to that of each 
other group identifies recent admixture from 
Gog into southern olive baboons, between west- 
ern and eastern yellow baboons, from southern 
olive baboons into eastern yellow baboons 
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(P = 0.04), between Kinda and chacma ba- 
boons (P = 0.02), and between Kinda and 
western yellow baboons. Repeating the Globe- 
trotter analysis assuming 14 populations 
representing all major sampling locations dif- 
ferentiates olive and yellow baboon popula- 
tions (Fig. 3D) and reveals a complex system 
of recent gene flow (all events < 95 genera- 
tions) between: (i) olive baboon populations, 
(ii) yellow baboon populations, (iii) yellow and 
Tarangire olive baboons, (iv) western yellow 
and Gombe olive baboons, and (v) Tarangire 
olive baboons and Ruaha yellow baboons. These 
results do not imply direct migration of males 
(e.g., individual males moving from Gog to 
Serengeti) but rather, more plausibly, the over- 
all consequences of many incremental gene 
flow events distributing alleles long distances 
over multiple generations. 

This is not the first study to suggest that the 
history of genetic differentiation and reticula- 
tion among baboons is complex. Previous studies 
(0, 18-21, 33, 39, 40) showing widespread 
phenotype-mitochondrial discordance strong- 
ly suggest that nuclear swamping (i.e., the 
immigration of males into a phenotypically 
different population, largely or completely 
displacing the nuclear DNA composition and 
phenotype of the invaded population, without 
changing its mtDNA composition) has been a 
major contributing process. The present study 
found a similar discordance between the ex- 
panded mtDNA phylogeny (Fig. 4 and fig. S8) 
on the one hand and the new autosomal and 
Y-chromosomal phylogenies generated in this 
study on the other (figs. S5 and S7). Thus, our 
WGS findings strongly support previous sug- 
gestions, based only on mtDNA and pheno- 
type data, that nuclear swamping has been a 
major factor generating the current pattern 
of baboon genetic and phenotypic variation. 

The dense sampling of mtDNA provides im- 
portant information about matrilineal ancestry. 
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However, as a single locus, mtDNA represents 
only one of many possible genealogies generated 
by ILS and admixture. To test the hypothesis 
that nuclear swamping produced the discord 
observed between mtDNA phylogenies and 
relationships derived from comparisons of 
phenotype, we contrasted ancestry propor- 
tions across the X chromosome and the sim- 
ilarly sized chromosome 8, each contributing 
thousands of individual genealogies. Admixture 
by hemizygous males introduces disproportion- 
ately more autosomal than X chromosomal 
sequence, rendering shared X chromosome 
ancestry a better representation of deep spe- 
cies relationships before admixture. We found 
that the X chromosome of our chacma baboons 
derives more ancestry from yellow baboons 
than their chromosome 8 does (0.47 versus 
0.62, paired t test, P = 0.005; Fig. 5A), suggesting 
that male-biased admixture from the ancestors 
of chacma baboons into the southern range of 
yellow baboons produced northern chacma ba- 
boons, including the grayfooted chacma ba- 
boons (P. ursinus grisiepes) that we analyze in 
this study. This observation is consistent with 
the close relationship between mtDNA found 
in southernmost yellow and northern chacma 
baboons (clade B in Fig. 4) 9, 40). The most 
compelling evidence of male-biased admixture 
is the relationship between western yellow 
and Kinda baboons. The ancestry profile of 
western yellow baboons (Fig. 5B) is very dif- 
ferent from eastern yellow baboons (Fig. 5C). 
Western yellow baboons share more ancestry 
with Kinda baboons on the X chromosome 
than on chromosome 8 (0.27 versus 0.44, paired 
t test, P = 0.025), whereas Kinda baboons con- 
tain twice as much western yellow baboon 
ancestry on the X chromosome as on chromo- 
some 8 (0.23 versus 0.55, paired t test, P = 1.8 x 
10°”; Fig. 5D). Furthermore, eastern yellow 
baboons share more X chromosomal an- 
cestry with western yellow baboons than 
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chromosome 8 ancestry (0.16 versus 0.20, 
paired t test, P = 3.1 x 10°; Fig. 5B). Together 
these observations indicate that western yellow 
baboons were produced mainly from males 
carrying haplotypes that originated among 
eastern yellow and southern olive baboons mi- 
grating into the ancestral range of Kinda ba- 
boons, replacing Kinda baboon autosomes more 
than they replaced Kinda baboon X chromo- 
somes. As a result, western yellow baboons 
carry genetic input from three distinct lineages. 

In addition to patterns of shared ancestry 
among populations and species, we used two 
strategies to seek preliminary evidence for 
species-specific genetic adaptations in baboons. 
First, we used PLINK (4) to identify SNVs 
enriched in one species relative to all others 
(table S8). Genes containing possibly func- 
tional SNVs enriched in a given taxon were 
correlated with species phenotypes using Gene 
Ontology (GO) (42) terms and literature searches. 
We also used OmegaPlus (43) to test those 
gene regions for evidence of selective sweeps. 
Across all species, 1,342,371 SNVs met the 
criteria for being enriched in one particular spe- 
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cies, including 4337 missense and 76 stop- 
gained SNVs (table S8). We next searched this 
list of candidates for genes annotated as in- 
fluencing known traits of that species. Among 
them, SNV_1 (Table 1 and fig. S32), a missense 
variant in serine protease 8 (PRSS8), has a 0.96 
allele frequency (AF) in hamadryas baboons 
and a 0.02 AF in the geographically adjacent 
Gog olive baboons (absent in other species). 
PRSS8 increases epithelial sodium channel 
activity and mediates sodium reabsorption 
through the kidneys (44). PRSS8 is under pos- 
itive selection in the desert-adapted canyon 
mouse (Peromyscus crinitus) (45), and hamadryas 
baboons inhabit the most arid environment of all 
baboons (46). SNV_2 (Table 1 and fig. S33) has 
a 1.0 AF in both hamadryas and Guinea ba- 
boons and is absent from other species. This is a 
missense variant in neurexin 1 (VRXNJ), which 
is associated with the GO term “social behavior.” 
Nrani knockout mice exhibit changes in male 
aggression (47). Guinea and hamadryas baboons 
differ from others in the genus in exhibiting a 
multilevel male-philopatric social organization 
with substantial male-male tolerance (29, 48). 


Donor 
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This contrasts with the matrilineal, male- 
dispersing social organization typical and likely 
ancestral for the genus. This observation is com- 
patible with the speculation that until “swamped” 
by males from olive and yellow baboon popu- 
lations, male-philopatric “pre-Guinea” and “pre- 
hamadryas” baboon populations occupied the 
northern savanna-woodland belt and much of 
the East African savanna-woodland corridor (33). 
SNV_3 (Table 1 and fig. S34) has a 1.0 AF in 
Kinda baboons and a 0.05 AF in yellow ba- 
boons (western yellow baboons and Ruaha) 
and one Serengeti olive baboon. This is a mis- 
sense variant in the pigmentation-associated 
agouti signaling protein (AS/P). In mice, this 
gene affects melanin synthesis, shifting eu- 
melanin production (black and brown hair) to 
phaeomelanin (red and yellow hair) (49). Kinda 
baboons display several distinctive coat color 
traits, including a substantial proportion of 
infants with white natal coats (6). 

In our second approach to functional varia- 
tion, we searched for genomic regions of elevated 
differentiation between pairs of closely related 
species [for details, see (13)]. We sought to 
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Fig. 5. Differential ancestry profiles on the X chromosome and an autosome. (A) Ancestry proportions of female chacma baboons. Each marker represents 
the fraction of total chromosome ancestry of one individual that is assigned to each of the remaining donor populations. Black dots and gray crosses represent 
ancestry proportions of chromosomes 8 and X, respectively. (B) Same as (A), but for female western yellow baboons. (C) Same as (A), but for female eastern yellow 
baboons. (D) Same as (A), but for female Kinda baboons. For additional profiles, see figs. S10, S30, and S31. 


Table 1. Species enriched SNV statistics. Cluster and OmegaPlus statistics for the hamadryas and Guinea baboon shared SNV_2 are shown for hamadryas 
baboons. CADD and REVEL scores from human annotations predict functional impact of mutations (see supplementary materials). 


SNV ID SNV PLINK P value Cluster length (base pairs) SNVsincluster OmegaPlus (percentile) CADDPHRED REVEL 
SNV120:27347531:67 140 x 10 OO Sr ere tometer acon Ee Reena warrant 9.99 (18%) nnn 0.00} on 0391, 
SNV_2  13:49896439-6-0 9.72 x Ot ssn TOU ania ecient ere tet ae i) ae. N/A 
SNV_3 10:30107617:T:C Ce < oe Sele 58 4.92 (0.7%) 19.140 0.080 
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determine whether regions with the strongest 
evidence of differentiation (windows in the 
top 0.1%) were enriched for genes with par- 
ticular GO terms. Genomic regions most dis- 
tinct between Kinda and yellow baboons were 
enriched for genes linked to skeletal develop- 
ment and morphogenesis (P value adjusted 
for false discovery rate, P = 1.77 x 10~*; tables 
S9 to S11 and fig. S35), including limb de- 
velopment (e.g., embryonic forelimb morpho- 
genesis, adjusted P = 0.02). This enrichment 
was driven by one region on chromosome 3 
containing a HOXA gene cluster (fig. S36) and 
may influence the distinctively small size and 
gracile, long-limbed build of Kinda baboons 
(16). Genes linked to male sexual differenti- 
ation were also increased in regions highly 
differentiated between Kinda and yellow ba- 
boons (adjusted P = 0.0484), possibly related 
to the reduced sexual dimorphism in Kinda 
baboons (50). 


Discussion 


Our expanded whole-genome dataset provides 
several insights into genetic reticulation and 
the evolutionary history of multiple local pop- 
ulations of baboons. Previous work showed 
that gene flow occurs among phenotypically 
and genetically distinct baboon species and 
pointed to nuclear swamping as a major con- 
tributing process. Our study extends and adds 
higher resolution to this picture, using genetic 
data to confirm hybrid zones that were pre- 
viously suspected from field observation of 
phenotypic variation alone. We also identify 
the first local population (western Tanzanian 
yellow baboons) that has clear evidence for 
genetic contributions from three genetically 
distinct lineages. 

While our results substantially extend our 
knowledge of baboon evolutionary history, 
some gaps remain. The richness of evolution- 
ary detail to be derived from denser sampling 
is indicated by our results from East African 
populations. More extensive genetic surveys 
are needed to document other regions with 
complex biogeographic and evolutionary his- 
tory, including the olive-Guinea baboon inter- 
face in West Africa (27), and regions of southern 
Africa where chacma baboons have experienced 
both ancient and recent periods of genetic diver- 
gence and reticulation (39, 40). Other geo- 
graphic regions, for example, the northern 
savanna-woodland belt west of our Gog pop- 
ulation, have not been studied and would 
likely provide further information, especially 
regarding the origins and history of olive and 
Guinea baboons. Nevertheless, our dense sam- 
pling in East Africa clearly identifies previously 
unknown arenas of gene flow and documents 
the complexity of the evolutionary history of 
baboons in this region. 

Our results lead to several substantive con- 
clusions. With regard to methods, we find 
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that while comparison of mtDNA and pheno- 
typic variation is effective in detecting nu- 
clear swamping, analyses comparing levels of 
shared ancestry across the X chromosome to 
that across autosomes provide a more quan- 
titative assessment of demographic processes 
and genetic history. Second, we conclude that 
Kinda baboons are not the product of a re- 
cent fusion event. Instead, they are more likely 
close to the basal ancestor of all extant ba- 
boons. Next, we find additional support for 
the prior observation that the primary sepa- 
ration of northern and southern baboon spe- 
cies is the result of dispersal from the south 
to the north, with Guinea baboons recognized 
as the most recent occupants of the leading 
edge of that dispersal. Despite the sharp gra- 
dient of phenotypes that is characteristic of 
baboon interspecies contact zones, gene flow 
distributes the introgressed alleles far from 
the regions of obvious hybridization. And fi- 
nally, we report that extant western yellow ba- 
boons carry genetic contributions from three 
genetically different baboon lineages. 

The patterns of local, regional, and species- 
level genetic structure in baboons are likely 
a valuable model for population structure 
in other primate clades that consist of multi- 
ple closely related species, such as African 
green monkeys [genus Chlorocebus (51)] and 
macaques [genus Macaca (52)]. Clades in other 
mammalian orders are also revealing complex, 
often reticulated, evolutionary histories similar 
to those of baboons [e.g., polar bears (53, 54), 
giraffes (7), and deer (55)]. The results for ba- 
boons also provide informative parallels and 
contrasts to the evolutionary differentiation 
and relationships among early human ances- 
tors that arose, differentiated, and admixed over 
a time span remarkably similar to that of ba- 
boon cladogenesis (56). 


Materials and methods summary 


Extended materials and methods are presented 
in the supplementary materials. Descriptions of 
procedures used for sampling baboons in the 
wild, preparing and sequencing genomic libra- 
ries, analyzing variation among animals, and 
inferring phylogenetic relationships, as well as 
other aspects of study methods, are provided. 


Samples and DNA sequencing 


Blood samples from 225 baboons and two gela- 
das were gathered in accordance with local reg- 
ulations. Genomic DNA was extracted from 
blood, and libraries were prepared for sequenc- 
ing on the NovaSeq 6000 platform (Illumina). 


Variant calling and phasing 


We used BWA-MEM to map reads to the 
Panu_3.0 baboon and the Mmul_10 rhesus 
assemblies. GATK was used to call variants 
following best practices. Panu_3.0 SNVs were 
phased using WhatsHap and SHAPEIT. 
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Population structure and phylogenetic analyses 
Population structure based on SNVs was ex- 
amined using PCA, ADMIXTURE, and fast- 
STRUCTURE. Phylogenetic trees based on 
autosomal and sex chromosome SNVs and 
Geneious assembled mitochondrial genomes 
were generated using IQ-TREE and visualized 
with FigTree. Polymorphic mobile elements were 
identified using DELLY and MELT. STRUC- 
TURE and MELT were used to analyze pop- 
ulation structure of Ll and Alu elements. PAUP 
was used to generate maximum parsimony trees 
from Alu and L1 elements. We used MSMC2 
to infer baboon demographic history and pop- 
ulation structure through time. Admixture 
graphs and /f3 outgroup statistics were gener- 
ated using ADMIXTOOLS 2. 


Inference of most recent coancestry along 
each chromosome 


ChromoPainter was used to infer the most 
recent coancestry along chromosomes, and 
fineSTRUCTURE was used to identify rela- 
tionships between individuals on the basis 
of their most recent coancestry. We used 
Globetrotter to compute P values for a coan- 
cestry contribution from recent admixture. 


Functional variation 


Functional genetic variation among study animals 
was examined using PLINK for association an- 
alyses and OmegaPlus for identification of se- 
lective sweeps. We performed differentiation-based 
scans for selection using windowed Fr values. 
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INTRODUCTION: Millions of people have received 
genome and exome sequencing to date, a col- 
lective effort that has illuminated for the first 
time the vast catalog of small genetic differ- 
ences that distinguish us as individuals within 
our species. However, the effects of most of these 
genetic variants remain unknown, limiting their 
clinical utility and actionability. New approaches 
that can accurately discern disease-causing from 
benign mutations and interpret genetic variants 
on a genome-wide scale would constitute a 
meaningful initial step towards realizing the 
potential of personalized genomic medicine. 


RATIONALE: As a result of the short evolution- 
ary distance between humans and nonhuman 
primates, our proteins share near-perfect amino 
acid sequence identity. Hence, the effects of a 


PrimateAl-3D, a deep learning model 


trained on millions of benign primate Ka BEN é3) es 
. . i 7 i) ; A r\% 
variants. Common primate variants gener mm \\ . iH 
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were validated as benign (98.7%) in the 
human ClinVar database. Voxelized protein 
structures (middle) with benign primate 
variants (spheres) were used to train a 3D 
convolution neural network to predict 
variant pathogenicity based on regional 
enrichment or depletion of primate variants. 
The resulting model was validated in 
independent clinical cohorts, as illustrated 
by the correlation of PrimateAl-3D scores 
and blood cholesterol levels for UK Biobank 
individuals (right). 
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4.3 million common benign 
variants from 233 primate 
species 


Validation of primate variants in 
human clinical variant database 


protein-altering mutation found in one species 
are likely to be concordant in the other species. 
By systematically cataloging common variants 
of nonhuman primates, we aimed to annotate 
these variants as being unlikely to cause human 
disease as they are tolerated by natural selec- 
tion in a closely related species. Once collected, 
the resulting resource may be applied to infer 
the effects of unobserved variants across the 
genome using machine learning. 


RESULTS: Following the strategy outlined above 
we obtained whole-genome sequencing data for 
809 individuals from 233 primate species and 
cataloged 4.3 million common missense var- 
iants. We confirmed that human missense var- 
iants seen in at least one nonhuman primate 
species were annotated as benign in the ClinVar 


’ 


Benign primate 
variants 
superimposed 
on 3D protein 
structures 


98.7% of common 
primate variants in 
ClinVar are benign 
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deep learning 
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models 


clinical variant database in 99% of cases. By| Chee 
trast, common variants from mammals = | 
vertebrates outside the primate lineage were 
substantially less likely to be benign in the 
ClinVar database (71 to 87% benign), restrict- 
ing this strategy to nonhuman primates. Over- 
all, we reclassified more than 4 million human 
missense variants of previously unknown con- 
sequence as likely benign, resulting in a greater 
than 50-fold increase in the number of anno- 
tated missense variants compared to existing 
clinical databases. 

To infer the pathogenicity of the remaining 
missense variants in the human genome, we 
constructed PrimateAI-3D, a semisupervised 
3D-convolutional neural network that oper- 
ates on voxelized protein structures. We trained 
PrimateAI-3D to separate common primate 
variants from matched control variants in 3D 
space as a semisupervised learning task. We 
evaluated the trained PrimateAI-3D model 
alongside 15 other published machine learning 
methods on their ability to distinguish between 
benign and pathogenic variants in six different 
clinical benchmarks and demonstrated that 
PrimateAI-3D outperformed all other classi- 
fiers in each of the tasks. 


CONCLUSION: Our study addresses one of the 
key challenges in the variant interpretation 
field, namely, the lack of sufficient labeled 
data to effectively train large machine learn- ‘ 
ing models. By generating the most compre- 
hensive primate sequencing dataset to date and 
pairing this resource with a deep learning ar- 
chitecture that leverages 3D protein structures, 
we were able to achieve meaningful improve- 
ments in variant effect prediction across mul- 
tiple clinical benchmarks. 
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Personalized genome sequencing has revealed millions of genetic differences between individuals, but 
our understanding of their clinical relevance remains largely incomplete. To systematically decipher 
the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals 
from 233 primate species and identified 4.3 million common protein-altering variants with orthologs 
in humans. We show that these variants can be inferred to have nondeleterious effects in humans 
based on their presence at high allele frequencies in other primate populations. We use this resource 
to classify 6% of all possible human protein-altering variants as likely benign and impute the 
pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy 
for diagnosing pathogenic variants in patients with genetic diseases. 


scalable approach for interpreting the 

effects of human genetic variants and 

their impact on disease risk is urgently 

needed to realize the promise of person- 

alized genomic medicine (/-3). Out of 
more than 70 million possible protein-altering 
variants in the human genome, only ~0.1% are 
annotated in clinical variant databases such as 
ClinVar (4), with the remainder being variants 
of uncertain clinical significance (5, 6). Despite 
collaborative efforts by the scientific commu- 
nity, the rarity of most human genetic variants 
has meant that progress toward deciphering 
personal genomes has been incremental (7, 8). 
Consequently, clinical sequencing tests fre- 
quently return without definitive diagnoses, a 
frustrating outcome for both patients and cli- 
nicians (9, JO). In certain cases patients must be 
recontacted and diagnoses reversed when the 
presumed pathogenic variant was later found 
to be a common variant in previously under- 
studied human populations (7-13). Common 
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variants can often be ruled out as the cause of 
penetrant genetic disease, because their high 
frequency in the population indicates that 
they are tolerated by natural selection, aside 
from rare exceptions due to founder effects 
and balancing selection (14-16). 

An emerging strategy for solving clinical 
variant interpretation on a genome-wide scale 
is the use of information from closely related 
primate species to infer the pathogenicity of 
orthologous human variants (17). Because chim- 
panzees and humans share 99.4% protein 
sequence identity (8), a protein-altering var- 
iant present in one species can be expected to 
produce similar effects on the protein in the 
other species. By conducting population se- 
quencing studies in closely related nonhuman 
primate species, it is feasible to systematically 
catalog common variants and rule these out as 
pathogenic in humans, analogous to how se- 
quencing more diverse human populations 
has helped to advance clinical variant inter- 


pretation (8, 17). Nonetheless, earlier work (77) 
was limited by the very small primate pop- 
ulation sequencing datasets available, which 
bounded the number of common variants dis- 
covered and the scale of machine learning 
classifiers that could be trained. 


Results 
A database of 4.3 million benign missense 
variants across the primate lineage 


To expand upon this strategy, we sequenced 
703 individuals from 211 primate species and 
aggregated these with data from previous 
studies (19-26), yielding a total of 809 individ- 
uals from 233 species. We identified 4.3 million 
unique missense (protein-altering) variants 
and 6.7 million unique synonymous (nonpro- 
tein altering) variants (Fig. 1A), after excluding 
variants at positions that lacked unambiguous 
1:1 mapping with humans, or that resulted 
in nonconcordant amino acid translation 
outcomes because of changes at neighboring 
nucleotides (fig. S1). The species selected for 
sequencing represent close to half of the 521 
extant primate species on Earth (27) and cover 
all major primate families, from Old World 
monkeys and New World monkeys to lemurs 
and tarsiers. We targeted a small number of 
individuals per species (3.5 on average) to 
ensure that we primarily sampled common 
variants that have been filtered by natural se- 
lection rather than rare mutations (fig. S2). 
Compared with the genome Aggregation 
Database (gnomAD) cohort of 141,456 human 
individuals from diverse populations (28, 29), 
the primate sequencing cohort contained 
~20% more exome variants despite sequenc- 
ing 1/175th the number of individuals (Fig. 1A 
and fig. $3), attesting to the notable genetic 
diversity present in nonhuman primate spe- 
cies (19, 30), many of which are critically en- 
dangered (37). The overlap of primate variants 
with gnomAD was low, consistent with inde- 
pendent mutational origins in each species (fig. 
$3). Out of the 22 million possible synonymous 
variants in the human genome, 30% were ob- 
served in the primate cohort, compared with 
just 6% of possible missense mutations (Fig. 1B). 
Because de novo mutations would have laid 
down unbiased proportions of missense and 
synonymous variants, the observed depletion 
of missense mutations in the primate cohort 
is consistent with most of the newly-arising 
human missense mutations being removed by 
natural selection as a result of their deleterious- 
ness (8, 32-34). The surviving missense variants 
are seen at high frequencies in primate popula- 
tions and represent a subset of missense var- 
iants that have tolerated filtering by natural 
selection and are unlikely to be pathogenic (35). 
Missense variants from the primate cohort 
are strongly enriched for benign consequence 
in the ClinVar clinical variant database (Fig. 1C). 
Among ClinVar variants with higher review 
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levels (two stars or above, indicating consen- 
sus by multiple submitters) (4), missense var- 
iants found in at least one nonhuman primate 
species were benign or likely benign ~99% of 
the time, compared with 63% for ClinVar mis- 
sense variants in general and 80% for missense 
variants seen in gnomAD (Fig. 1C). The high 
fraction of pathogenic variants in gnomAD is 
consistent with most of these variants having 
arisen recently. Indeed, recent exponential hu- 
man population growth introduced large num- 
bers of rare variants through random de novo 
mutation (95% of variants in the gnomAD 
cohort are at <0.01% population allele fre- 
quency), without sufficient time for selection 
to purge deleterious variants from the popula- 
tion (36-40). Consequently, the gnomAD cohort 
provides a comparatively unfiltered look at var- 
iation caused by random mutations, whereas 
primate common variants represent the subset 
of random mutations that have survived. 

The regions of human disease genes that 
were most densely populated by ClinVar path- 
ogenic variants were also strongly depleted for 


primate common variants, with examples shown 
for CACNAIA (Fig. 1D) and CREBBP (fig. S4), 
genes responsible for familial epilepsy (41, 42) 
and Rubinstein-Taybi syndrome (43, 44). Mis- 
sense variants in the gnomAD cohort were par- 
tially depleted within these same critical regions 
(Fig. 1D and fig. S4), indicating that humans 
and primates experience similar selective pres- 
sures. However, deleterious variants were in- 
completely removed in humans, consistent with 
the shorter amount of time they were exposed 
to natural selection. 

Prior to using primate data as an indicator 
of benign consequence in a diagnostic setting, 
it is vital to understand why a handful of hu- 
man pathogenic ClinVar variants appear as 
tolerated common variants in primates. Our 
clinical laboratory independently reviewed 
evidence for each of the 36 ClinVar patho- 
genic variants that appeared in the primate 
cohort, according to ACMG guidelines (/4). 
Among these 36 variants, 8 were reclassified 
as variants of uncertain significance based on 
insufficient evidence of pathogenicity in the 


literature and an additional 9 were hypomorphic 
or mild clinical variants (table S1). The remaining 
19 variants appear to be truly pathogenic in 
humans and are presumably tolerated in pri- 
mates because of primate-human differences, 
such as interactions with changes in the neigh- 
boring sequence context (45, 46). In one such 
example, a compensatory synonymous sequence 
change at an adjacent nucleotide explains why 
the variant is benign in primates but creates a 
pathogenic splice defect in humans (Fig. 1E). 
We also expect that some of the variants iden- 
tified among primates are rare pathogenic var- 
iants by chance, despite the small number of 
individuals sequenced within each species. 
By expanding our cohort to sequence a large 
number of individuals per species, we would 
definitively exclude rare variation from our 
catalog of primate variation, as well as grow 
the database of benign variants to improve 
clinical variant interpretation. 

As evolutionary distance from humans in- 
creases, cases in which the surrounding sequence 
context has changed sufficiently to alter the 
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Fig. 1. Common primate 
variants are largely benign 
in humans. (A) Counts 

of missense (solid green) and 
synonymous (shaded gray) 
variants from primates 
compared with the gnomAD 
database. Missense:synonymous 
counts and ratios are displayed 
above each bar. (B) Fractions 
of all possible human synony- 
mous (gray) and missense 
variants (green) observed in 
primates. (C) Counts of benign 
(gray) and pathogenic (red) 
missense variants with two-star 
review status or above in 

the overall ClinVar database 
(left pie chart), compared with 
ClinVar variants observed in 
gnomAD (middle), and com- 
pared with ClinVar variants 
observed in primates (right). 
Conflicting benign and patho- 
genic annotations and variants 
interpreted only with uncertain 
significance were excluded. 
(D) Observed gnomAD (green) 
or primate (blue) missense 
variants in each amino acid 
position in the CACNAIA gene. 
Red circles represent the 
positions of annotated ClinVar 
pathogenic missense variants. 
Bottom scatterplot shows 
PrimateAl-3D predicted patho- 
genicity scores for all possible 
missense substitutions along 
the gene. (E) Multiple sequence 
alignment showing the 

ClinVar pathogenic variant 
chr11:77181548 G>A (red 
arrow) creating a cryptic splice 
site in human sequence 
(extended splice motif, blue). 
This variant is tolerated in 
Cebus Albifrons and other 
species with a G>C synonymous 
change in the adjacent nucleo- 
tide that stops the splice 

motif from forming. (F) Pie 
charts showing the fraction of 
benign (gray) and pathogenic 
(red) missense variants with 
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effect of the variant should also increase until 
common variants in more-distant species 
could no longer be reliably counted on as be- 
nign in humans. We examined variation in 
each major branch of the primate tree as well 
as variation from mammals (mouse, rat, cow, 
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dog), chicken, and zebrafish and evaluated 
their pathogenicity in ClinVar (Fig. 1F). Com- 
mon variants from species throughout the pri- 
mate lineage, including more-distant branches 
such as lemurs and tarsiers, varied from 98.6 to 
99% benign in the human ClinVar database, but 


this dropped to 87% for placental mammals and 
71% for chicken. The high fraction of variants 
that are pathogenic in humans yet tolerated as 
common variants in more distant vertebrates 
indicates that selection on orthologous var- 
iants diverges substantially in distantly related 
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species as a consequence of changes in the sur- 
rounding sequence context and other differences 
in species’ biology (fig. S5). 

We have made the primate population variant 
database, which contains more than 4.3 mil- 
lion likely benign missense variants, publicly 
available at https://primad.basespace.illumina. 
com as a reference for the genomics commu- 
nity. Overall, this resource is over 50 times larger 
than ClinVar in terms of number of annotated 
missense variants and consists almost entirely 
of variants of previously unknown significance. 
Most primate variants are rare or absent in 
the human population, with 98% of these var- 
iants at allele frequency <0.01% (fig. S6). This 
makes it challenging to establish their patho- 
genicity through other means, because even the 
largest sequencing laboratories would be un- 
likely to observe any given variant in more than 
one unrelated patient. Despite their rarity, the 
subset of human variants that appear in pri- 
mates have a low missense:synonymous ratio 
consistent with being depleted of deleterious 
missense variants (Fig. 1G). This contrasts with 
the high missense:synonymous ratio for rare 
human variants in the overall gnomAD cohort, 
which approaches the 2.2:1 ratio expected for 
random de novo mutations in the absence of 
selective constraint (47). At higher allele fre- 
quencies, natural selection has had more time 
to purge deleterious missense variants, allow- 
ing the human missense:synonymous ratio to 
start to converge toward the ratio observed for 
the subset of human variants that are present 
in other primates. 


Gene-level selective constraint in humans 
versus nonhuman primates 


The primate variant resource makes it possible 
to compare natural selection acting on indi- 
vidual genes across the primate lineage and 
identify human-specific evolutionary differences. 
Because the current primate cohort only con- 
tains an average of 3 to 4 individuals per species, 
we focused on comparing selective constraint 
in human genes versus primates as a whole. 
We found that the missense:synonymous ra- 
tios of individual genes were well-correlated 
between humans and primates (Spearman r = 
0.637) (Fig. 2A), indicating that genes that 
were depleted for deleterious missense muta- 
tions in humans were also consistently depleted 
throughout the primate lineage. Moreover, the 
missense:synonymous ratios of both human 
and primate genes correlated similarly well 
with the probability of genes being loss of 
function intolerant (pLI) (Spearman correla- 
tion —0.534 and —0.489, respectively) (28). Had 
there been substantial divergence between 
humans and primates, pLI, an independent 
metric derived from human protein-truncating 
variation, would have been expected to show 
much clearer agreement with human missense: 
synonymous ratios than primate. 
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Fig. 2. Selective constraint of primate genes compared with humans. (A) Scatter plot of missense: 


synonymous ratios 


between primate and human genes. Each gene is colored by its pLI score, with darker 


points showing haploinsufficient genes. (B) Observed and expected counts of synonymous (top) and 


missense (bottom) 


variants per gene in gnomAD (left) and primates (right). Genes are colored by their pLl 


scores. (C) Distributions of observed and expected ratios of synonymous (dashed lines) and missense (solid 
lines) variants for all genes. Results for primate genes (orange) and gnomAD genes (blue) are shown. 


(D) Scatter plot of 


missense:synonymous ratios between primate and human genes. Highlighted points are 


genes that are under significantly stronger (blue) or weaker (red) constraint in humans compared with 
nonhuman primates under both methods (Benjamini-Hochberg FDR < 0.05) and gray points show 
nonsignificant genes. The top 10 genes with the largest effect sizes in either direction are labeled. 
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To measure the selective constraint on each 
gene, we calculated the observed versus ex- 
pected number of variants per gene, using tri- 
nucleotide mutation rates to model the expected 
probability of observing each variant (fig. S7) 
(28, 29). We modeled each primate species 
separately to account for differences in genetic 
diversity and the number of individuals sampled 
per species. The expected and observed counts 
of synonymous variants were highly corre- 
lated in both the gnomAD and primate cohorts, 
indicating that our model accurately captured 
the background distribution of neutral muta- 
tions (Fig. 2B; Spearman correlation 0.933 and 
0.949, respectively). By contrast, for missense 
variants the expected and observed counts per 
gene diverged substantially (Spearman corre- 
lation 0.896 and 0.561 for humans and primates, 
respectively), due to depletion of deleterious 
missense variants by natural selection in highly 
constrained genes (for example, high pLI genes). 
The most highly constrained genes were almost 
completely scrubbed of common missense var- 
iants in the primate cohort, whereas rare mis- 
sense variants in the gnomAD cohort were 
depleted to a more modest extent because of 
the large sample size of gnomAD (Fig. 2C). 

We next aimed to identify genes whose se- 
lective constraint was different in humans 
compared with the rest of the primate lineage, 
a task made difficult by differences in diver- 
sity, allele frequency, and sample size between 
the human and primate cohorts (34, 48, 49). 
To this end, we developed two orthogonal strat- 
egies and took the intersection of genes iden- 
tified under both approaches. First, we used 
population genetic modeling (34, 50, 51) to 
estimate the average selection coefficient, s, 
ranging from O (benign) to 1 (severely path- 
ogenic) of missense mutations in each gene, 
using a model of recent human population 
growth (figs. S7 and S8). We fit a single value of 
S per gene across nonhuman primate species 
and identified genes that differed between 
Sprimate AN Spyman USiNg a likelihood ratio test, 
which we validated using population simula- 
tions (fig. S9). In a second approach, we fit a 
curve approximating the relationship between 
human and primate missense:synonymous 
ratios using a Poisson generalized linear mixed 
model (52) and identified genes in which the 
observed human missense:synonymous ratio 
deviated from what would have been expected 
given the gene’s missense:synonymous ratio 
in primates (fig. S10). We also adjusted for gene 
length to account for shorter genes having more 
variability in their missense:synonymous ratio 
measurements than longer genes. The two meth- 
ods were broadly concordant, with a Spearman 
correlation of 0.80 between the genes’ effect 
sizes in the two tests. Estimates of selection co- 
efficients and observed and expected counts 
for each gene in humans and primate are pro- 
vided in table S2. 


Gao et al., Science 380, eabn8197 (2023) 


2 June 2023 


PRIMATE GENOMES 


In total, we found 39 genes in which selec- 
tive constraint differed significantly between 
humans and other primates under both meth- 
ods [Benjamini-Hochberg FDR < 0.05 (53); 
Fig. 2D]. The top three genes in which S$)... 
decreased the most relative to Sy-imate Were 
CFTR, GJB2, and CD36, autosomal recessive 
disease genes for cystic fibrosis (54), hered- 
itary deafness (55), and platelet glycoprotein 
deficiency (56), respectively. All three genes 
are known for deleterious mutations that are 
unusually common in local geographic human 
populations (57-60), suggesting that they may 
be experiencing reduced selection due to het- 
erozygote advantage that protects against spe- 
cific environmental pathogens (60-64). On the 
other end of the spectrum, JERT, known for 
its role in maintaining telomere length (65, 66), 
was among the top genes in which Spynqn in- 
creased the most relative to Sprimare. Humans 
have adapted to a much longer life span com- 
pared with other primate species, which have 
a median life span of 20 to 30 years, suggesting 
that increased selection on TERT may have 
occurred as part of human adaption toward 
extended longevity. We note that with the cur- 
rent size of the primate cohort, it is not possible 
to distinguish whether the increased selec- 
tion on TERT occurred only in humans, or if 
it is part of a gradual trend toward extended 
longevity that began earlier in the great ape 
lineage, which also have longer life spans rela- 
tive to other primates (~40 years). Expanding 
the primate cohort by sequencing more indi- 
viduals per species would improve detection of 
additional species-specific and lineage-specific 
evolutionary adaptations and shed light on 
the evolutionary path that led to the present 
human condition. 


PrimateAl-3D, a deep learning network 
for classifying protein-altering variants 


We constructed PrimateAI-3D, a semisuper- 
vised 3D convolutional neural network for 
variant pathogenicity prediction, which we 
trained using 4.5 million common missense 
variants with likely benign consequence (Fig. 
3A). In a departure from prior deep learning 
architectures that operated on linear sequences 
(17, 67), we voxelized the 3D structure of the 
protein at 2 A resolution (figs. S11 and S12) and 
used 3D convolutions to enable the network to 
recognize key structural regions that may not be 
apparent from sequence alone (Fig. 3A). As an 
example, we show PrimateAI-3D predictions for 
STKII (Fig. 3B), the tumor suppressor gene re- 
sponsible for Peutz-Jeghers hereditary polyposis 
syndrome (68-71), with each amino acid po- 
sition colored by the average PrimateAI-3D 
score at that position. Common primate var- 
iants used for training and annotated ClinVar 
pathogenic variants from separate parts of the 
linear sequence form distinct clusters in 3D 
space. Although ClinVar variants are shown 


for illustration, it should be noted that the 
network was not trained on either human- 
engineered features or annotated variants 
from clinical variant databases, thereby avoid- 
ing potential human biases in variant annota- 
tion. Rather, it learns to infer pathogenicity 
based on the local enrichment or depletion of 
common primate variants, taking only the pro- 
tein’s multiple sequence alignment and 3D 
structure as inputs. 

PrimateAI-3D can use protein structures 
from either experimental sources or computa- 
tional prediction (72-76); we used AlphaFold 
DB (72, 73) and HHpred (74) predicted struc- 
tures for the broadest coverage across human 
genes. For training data, we incorporated all 
common missense variants from the 233 non- 
human primate species (7) and common hu- 
man missense variants (allele frequency > 
0.1% across populations) in gnomAD (28, 29), 
TOPMed (77, 78), and UK Biobank (UKBB) 
(79, 80), resulting in a total of 4.5 million unique 
missense variants of likely benign consequence. 
This dataset covers 6.34% of all possible hu- 
man missense variants and is over 50 times 
larger than the current ClinVar database (79,381 
missense variants after excluding variants of 
uncertain significance and those with conflict- 
ing annotations), greatly enlarging the train- 
ing dataset available for machine learning 
approaches. Because the training dataset con- 
sists only of variants labeled as benign, we 
created a control set of randomly selected var- 
iants that were matched to the common var- 
iants by trinucleotide mutation rate and trained 
PrimateAI-3D to separate common variants 
from matched controls as a semisupervised 
learning task. 

In parallel with the variant classification 
task, we generated amino acid substitution 
probabilities for each position in the protein by 
masking the residue and using the sequence 
context to predict the missing amino acid, 
borrowing from language model architectures 
that are trained to predict missing words in 
sentences (8/, 82). We trained both a 3D con- 
volutional “fill-in-the-blank” model, which tasked 
the network with predicting the missing ami- 
no acid in a gap in the voxelized 3D protein 
structure, and separately, a language model 
using the transformer architecture to predict 
the missing amino acid using the surrounding 
multiple sequence alignment as context (83). 
We implemented these models as additional 
loss functions to further refine the PrimateAI- 
3D predictions (fig. S13). We also trained a 
variational autoencoder (67) on multiple se- 
quence alignments and found that it performed 
comparably to our transformer architecture 
(fig. S14). Hence, we incorporated the aver- 
age of their predictions in the loss function, 
which performed better than either alone. 

We evaluated PrimateAI-3D and 15 other 
published machine learning methods (67, 84) 
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Fig. 3. PrimateAl-3D architecture and variant classification performance. (A) PrimateAl-3D workflow. Human 
protein structures and multiple sequence alignments are voxelized (left) as input to a 3D convolutional neural network 
that predicts pathogenicity of all possible point mutations of a target residue (middle). The network is trained 

using a loss function with three components (right): common human and primate variants; fill-in-the-blank of a protein 
structure; score ranks from language models. (B) Protein structure of the STK1I gene, colored by PrimateAl-3D 
pathogenicity prediction scores (blue, benign; red, pathogenic). Spheres indicate residues with common human and 
primate variants (left) or residues with pathogenic mutations from ClinVar (right). For spheres, the color corresponds to 
the pathogenicity score of only the variant. For other residues, pathogenicity scores are averaged over all variants 

at that site. (C) Scatterplot shows performance of methods that predict missense variant pathogenicity in two clinical 
benchmarks (DDD and UKBB). Datasets are a subset of variants for which all methods have predictions. (D) Six 
barplots show method performance for six testing datasets (DMS assays, UKBB, ClinVar, DDD, ASD, and CHD). 
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on their ability to distinguish between benign 
and pathogenic variants along six different 
axes (Fig. 3, C and D, and fig. S15): predict- 
ing the effects of rare missense variants on 
quantitative clinical phenotypes in a cohort of 
200,643 individuals from the UKBB; distin- 
guishing missense de novo mutations (DNM) 
seen in 31,058 patients with neurodevelop- 
mental disorders (DDD) (85-87) from de novo 
missense mutations in 2555 healthy controls 
(88-93); distinguishing de novo missense mu- 
tations seen in 4295 patients with autism spec- 
trum disorders (ASD) (88-94) from de novo 
missense mutations in the shared set of 2555 
healthy controls; distinguishing de novo mis- 
sense mutations seen in 2871 patients with 
congenital heart disease (CHD) (95) from de 
novo missense mutations in the shared set of 
2555 healthy controls; separating annotated 
ClinVar benign and pathogenic variants (ClinVar) 
(4); and average correlation with in vitro deep 
mutational scan (DMS) experimental assays 
across nine genes (96-105). Our set of clinical 
benchmarks is the most comprehensive to 
date and has a particular focus on rigorously 
testing the performance of classifiers on large 
patient cohorts across a diverse range of real- 
world clinical settings (table S3). 

For the UKBB benchmark, we analyzed 
200,643 individuals with both exome sequenc- 
ing data and broad clinical phenotyping and 
identified 42 genes in which the presence of 
rare missense variants was associated with 
changes in a quantitative clinical phenotype 
controlling for confounders such as popula- 
tion stratification, age, sex, and medications 
(table S4:). These gene-phenotype associations 
included diverse clinical lab measurements 
such as low-density lipoprotein (LDL) choles- 
terol (increased by rare missense variants in 
LDLR, decreased by variants in PCSK9), blood 
glucose (increased by variants in GCK), and 
platelet count (increased by variants in JAK2, 
decreased by variants in GPIBB), as well as 
other quantitative phenotypes such as stand- 
ing height (increased by variants in ZFAT) 
(table S4). To test each classifier’s ability to 
distinguish between pathogenic and benign 
missense variants, we measured the correla- 
tion between pathogenicity prediction score 
and quantitative phenotype for patients carry- 
ing rare missense variants in each of these 
genes. We report the average correlation across 
all gene-phenotype pairs for each classifier, 
taking the absolute value of the correlation 
because these genes may be associated with 
either increase or decrease in the quantita- 
tive clinical phenotype. 

The DDD, ASD, and CHD cohorts are among 
the largest published trio-sequencing studies 
to date and consist of thousands of families 
with a child with rare genetic disease and their 
unaffected parents. In each cohort, we cata- 
loged de novo missense mutations that appeared 
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Fig. 4. Impact of training data- A 
set size on classification accu- 1.00 
racy. (A) Improved performance of 
PrimateAl-3D with increasing num- 
ber of common human and primate 

0.95 


variants in the training dataset 
(x-axis). Performance of each data- 
set (y-axis) was divided by the 
maximum performance observed 
across all training dataset sizes. 
(B) Cumulative fractions of all 
possible human synonymous (gray) 
and missense (green) variants 
observed as common variants in 
234 primate species, including 
humans (allele frequency > 0.1%). 
Each point shows the average of 
10 permutations, calculated with a 
different random ordering of the list 
of primate species each time. 


Accuracy (fraction of max) 


in affected probands but were absent in their 
parents, as well as de novo missense muta- 
tions that appeared in a set of shared healthy 
controls. We evaluated the ability of each clas- 
sifier to separate the de novo missense muta- 
tions that appear in cases versus controls on 
the basis of their prediction scores, using the 
Mann-Whitney U test to measure performance. 

PrimateAI-3D outperformed all other clas- 
sifiers at distinguishing pathogenic from be- 
nign variants in the four patient cohorts we 
tested (UKBB, DDD, ASD, CHD); it was also 
the top performer at separating pathogenic 
from benign variants in the ClinVar annota- 
tion database and had the highest average cor- 
relation with the deep mutational scan assays 
(Fig. 3D and fig. S15). After PrimateAI-3D there 
was no clear runner-up, with second place oc- 
cupied by six different classifiers in the six dif- 
ferent benchmarks. We observed a moderate 
correlation between the performance of differ- 
ent classifiers in UKBB and DDD (Spearman 
r = 0.556; Fig. 3C), which are the two largest 
clinical cohorts and therefore likely the most 
robust for benchmarking (with 200,643 and 
33,613 patients, respectively), but outside of 
PrimateAI-3D, strong performance of a classi- 
fier on one task had limited generalizability to 
other tasks. Our results underscore the impor- 
tance of validating machine learning classifiers 
along multiple dimensions, particularly in large 
real-world cohorts, to avoid overgeneralizing a 
classifier’s performance based upon a notable 
showing along a single axis. 

PrimateAI-3D’s top-ranked performance at 
separating benign and pathogenic missense 
variants in ClinVar was unexpected, as the 
other machine learning classifiers (with the 
exception of EVE) were trained either directly 
on ClinVar or on other variant annotation data- 
bases with a high degree of content overlap. 
Because they are primarily based on variants 
described in the literature, clinical variant 
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databases are subject to ascertainment bias 
(12, 106, 107), which may have contributed to 
supervised classifiers picking up on tendencies 
of human variant annotation that are unre- 
lated to the task of separating benign from 
pathogenic variants (figs. S16, S17, and S18). 
Given the challenges with human annotation, 
we also investigated whether PrimateAI-3D 
could assist in revising incorrectly labeled 
ClinVar variants, by comparing annotations 
in the current ClinVar database and those 
from a September 2017 snapshot. Disagree- 
ment between PrimateAI-3D and the 2017 
version of ClinVar was highly predictive of 
future revision and the odds of revision in- 
creased with PrimateAI-3D confidence (fig. 
S19). Among variants with the 10% most con- 
fident PrimateAI-3D predictions, the odds of 
revision were elevated by a factor of 10 if 
PrimateAI-3D was in disagreement with the 
ClinVar label (P < 10°“). 

The performance of PrimateAI-3D on clinical 
variant benchmarks scaled directly with train- 
ing dataset size, indicating that additional 
primate sequencing data will be the key to 
unlocking further gains (Fig. 4 and fig. $20). 
The current primate cohort already covers 30% 
of all possible synonymous variants in the 
human genome, despite containing only 809 
individuals from 233 species (Fig. 4B). By in- 
creasing the number of species and the num- 
ber of individuals sequenced per species, we 
expect to saturate most of the remaining tol- 
erated substitutions in the human genome 
(fig. S21), including both coding and non- 
coding variation, leaving the remaining dele- 
terious variants to be deduced by a process of 
elimination. 


Discovery of candidate disease genes for 
neurodevelopmental disorders 


We applied PrimateAI-3D to improve statis- 
tical power for discovering candidate disease 


+ synonymous 
xX missense 


102 103 


10! 
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genes that are enriched for pathogenic de novo 
mutations in the neurodevelopmental disor- 
ders cohort (fig. S22). De novo missense mu- 
tations from affected individuals in the DDD 
cohort (87) were enriched 1.36-fold above ex- 
pectation, based on estimates of background 
mutation rate using trinucleotide context (47). 
We selected a PrimateAI-3D classification thresh- 
old of 0.821, which called an equal number of 
pathogenic missense mutations (n = 7,238) as 
the excess of de novo missense mutations in 
the cohort (Fig. 5A). Stratifying missense mu- 
tations by this threshold increased enrichment 
of pathogenic de novo missense mutations to 
2.0-fold, substantially increasing statistical 
power for disease gene discovery in the cohort 
(Fig. 5B). 

By applying PrimateAI-3D to prioritize path- 
ogenic missense variants, we identified 290 
genes associated with intellectual disability 
at genome-wide significance (P < 6.4 x 107’) 
(Table 1), of which 272 were previously discov- 
ered genes that either appeared in the Ge- 
nomics England intellectual disability gene 
panel (108) or were already identified in the 
prior study (J09) without stratifying missense 
variants (table S5). We excluded two genes, 
BMPR2 and RYRI, as borderline significant 
genes that already had well-annotated non- 
neurological phenotypes. Further clinical studies 
are needed to independently validate this list 
of candidate genes and understand their range 
of phenotypic effects. 


Discussion 


Our results demonstrate the successful pair- 
ing of primate population sequencing with 
state-of-the-art deep learning models to make 
meaningful progress toward solving variants 
of uncertain significance. Primate population 
sequencing and large-scale human sequencing 
are likely to fill complementary roles in ad- 
vancing clinical understanding of human 
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Table 1. Additional genes discovered in intellectual disability. Genes achieving the genome-wide significance (P < 6.4 x 10°’) are shown when considering 
only missense de novo mutations with PrimateAl-3D scores 20.821. Counts of protein-truncating and missense DNMs are provided. P values for gene 
enrichment are shown when the statistical test was run only with missense mutations with PrimateAl-3D score 20.821 and when it was repeated for all 
missense mutations. 
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genetic variants. From the perspective of ac- 
quiring additional benign variants to train 
PrimateAI-3D, humans are not suitable, as the 
discovery of common human variants (>0.1% 
allele frequency) plateaus at ~100,000 missense 
variants after only a few hundred individuals 
(7), and further population sequencing into 
the millions mainly contributes rare variants 
that cannot be ruled out for deleterious conse- 
quence. By contrast, because these rare human 
variants have not been thoroughly filtered by 
natural selection, they preserve the potential 
to exert highly penetrant phenotypic effects, 
making them indispensable for discovering 
new gene-phenotype relationships in large 
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population sequencing and biobank studies. 
Fittingly, classifiers trained on common primate 
variants may accelerate these target discovery 
efforts by helping to differentiate between 
benign and pathogenic rare variation. 

The genetic diversity found in the 520 known 
nonhuman primate species is the result of 
ongoing natural experiments on genetic vari- 
ation that have been running uninterrupted 
for millions of years. Today, more than 60% of 
primate species on Earth are threatened with 
extinction in the next decade as a result of 
man-made factors (37). We must decide whether 
to act now to preserve these irreplaceable spe- 
cies, which act as a mirror for understanding 


our genomes and ourselves, and are each val- 
uable in their own right, or bear witness to the 
conclusion of many of these experiments. 


Materials and methods 
Primate polymorphism data 


We aggregated high-coverage whole genomes 
of 809 primate individuals across 233 primate 
species, including 703 newly sequenced samples 
and 106 previously sequenced samples from 
the Great Ape Genome project (19). Samples 
that passed quality evaluation were then aligned 
to 32 high-quality primate reference genomes 
(110) and mapped to the GRCh38 human ge- 
nome build. 
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We developed a random forest (RF) classi- 
fier to identify false positive variant calls and 
errors resulting from ambiguity in the species 
mapping. In addition, we removed variants 
that fell in primate codons that did not match 
the human codon at that position, as well as 
those residing in primate transcripts with likely 
annotation errors. We also devised quality 
metrics based on the distribution of RF scores 
and Hardy-Weinberg equilibrium, and devel- 
oped a unique mapping filter to exclude var- 
jants in regions of nonunique mapping between 
primate species. 


Identifying differential selection between humans 
and primates through population modeling 


We first established a neutral background dis- 
tribution of mutation rates per gene for each 
primate species by fitting the Poisson Random 
Field model to the segregating synonymous 
variants in each species. The observed number 
of segregating synonymous sites is a Poisson 
random variable, with the mean determined 
by mutation rate, demography, and sample 
size (34). For simplicity, we assumed an equi- 
librium (i.e., constant) demography for all spe- 
cies besides humans; for humans, we used 
Moments (57) to find a best-fitting demographic 
history based on the folded site frequency 
spectrum of synonymous sites. We adopted 
a Gamma distributed prior on mutation rates, 
which also accounts for the impact of GC con- 
tent on mutation rate. We optimized the prior 
parameters through maximum likelihood and 
computed the posterior distribution of the 
mutation rate per gene. 

The number of segregating nonsynonymous 
sites is modeled as a Poisson random variable 
similar to synonymous sites with additional 
selection parameters. We assumed that every 
nonsynonymous mutation in a gene shares the 
same population-scaled selection coefficient 
Yig- To explicitly estimate the selection coeffi- 
cient of each gene per species, we devised a 
two-step procedure analogous to an expectation- 
Maximization algorithm to control for differ- 
ences in population size across species. 

To identify genes in which human constraint 
is different from nonhuman primate selection, 
we developed a likelihood ratio test to test 
whether population-scaled selection coefficients 
are significantly different between humans and 
other primates. We then assessed whether our 
population genetic modeling improved the cor- 
relation of selection estimates of our primate 
data with previous gene-constraint metrics in 
humans, including pLI (28) and s_het (111). To 
validate the performance of our model, we 
performed population genetic simulations. 


Poisson generalized linear mixed modeling 
of selection between humans and primates 


In addition to the population genetics model 
described above, we also applied an orthogonal 
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approach to detect differences in selection 
between humans and primates based on mis- 
sense: synonymous ratios. We fit a Poisson 
generalized linear mixed model (GLMM) 
to the pooled polymorphic synonymous and 
missense mutations across all primates to 
estimate the depletion of missense variants 
in each gene. Then, we fit a second Poisson 
GLMM to the human data, controlling for the 
primate depletion estimates, and compared 
the pooled primate MSR with the human 
MSR for each gene. 


PrimateAl-3D model 


PrimateAI-3D is a 3D convolutional neural net- 
work that uses protein structures and multi- 
ple sequence alignments (MSA) to predict 
the pathogenicity of human missense variants. 
To generate the input for a 3D convolutional 
neural network, we voxelized the protein struc- 
ture and evolutionary conservation in the re- 
gion surrounding the missense variant. The 
network was trained to optimize three objec- 
tives: distinction between benign and unknown 
human variants; prediction of a masked amino 
acid at the variant site; per-gene variant ranks 
based on protein language models. 


Protein structures and multiple 
sequence alignments 


For 341 species, we used vertebrate and mam- 
mal MSAs from UCSC Multiz100 (172, 173) and 
Zoonomia (23). Another 251 species appeared 
in Uniprot for at least 75% of all human pro- 
teins (14). For each protein, alignments from 
all 341+251=592 species were merged. Human 
protein structures were taken from AlphaFold 
DB (June 2021) (73). Proteins that did not 
sequence-match exactly to our hg38 proteins 
(2590; 13.5%) were homology modeled using 
HHpred (7) and Modeller (115). 


Protein voxelization and voxel features 


A regular sized 3D grid of 7x7x7 voxels, each 
spanning 2Ax2Ax2A, was centered at the Ca 
atom of the residue containing the target 
variant (fig. S11). For each voxel, we provided 
a vector of distances between its center and 
the nearest Ca and CB atoms of each amino 
acid type (fig. S11; details in Supplementary 
Text section 1). We also provided additional 
voxel features including the pLDDT confidence 
metric from AlphaFold DB (fig. S12), and the 
evolutionary profile, consisting of each amino 
acid’s frequency at the corresponding posi- 
tion in the 592 species alignment. 


Model architecture 


The first layers of the PrimateAI-3D model re- 
duce the voxel tensor to a 64-vector through 
repeated valid-padded 3D convolutions with 
a kernel size of 3x3x3. A final hidden dense 
layer transforms this 64-length vector into a 
20-length vector, corresponding to one output 


unit per amino acid at that position. The model 
was trained simultaneously using multiple loss 
functions to optimize the following comple- 
mentary aspects of pathogenicity: 


Benign primate variants 


Using 4.5 million benign missense variants 
from primates, we sampled the same number 
of unknown variants from the set of all pos- 
sible human missense variants, with the dis- 
tribution of mutational probabilities matching 
the benign set, based on a trinucleotide muta- 
tion rate model. Variants for the same protein 
position were combined in a 20-length vector 
(benign: 0, unknown: 1) which was the target 
label for the network. We used mean squared 
error (MSE) as the loss function for non-missing 
labels and ignored missing labels. 


3D fill-in-the-blank 


We removed all atoms of a target residue be- 
fore voxelization, discarding any information 
about the residue from the input tensor to the 
network. The network was then trained to 
predict a 20-length vector, labeled 0 (benign) 
for amino acids that occur at the target site 
in any of the 592 species and 1 (pathogenic) 
otherwise. All human protein positions with 
at least one possible missense variant were 
included in this dataset. 


Variant ranks from language models 


For each gene, we took the average pathogenic- 
ity ranking from two protein language models, 
PrimateAI language model (PrimateAI LM, 
described below) and our reimplementation 
of the EVE variational autoencoder algorithm 
which we extended to all human proteins (EVE*) 
(67). We calculated the pairwise logistic rank 
loss as described in Pasumarthi et al. (16). 


PrimateAl language model 


The PrimateAI language model (PrimateAI LM) 
is a MSA transformer (83) for fill-in-the-blank 
residue classification, which was trained end- 
to-end on MSAs of UniRef-50 proteins (115, 117) 
to minimize an unsupervised masked language 
modelling (MLM) objective (87). Our model 
requires ~50x less computation for training 
than previous MSA transformers as a result 
of several improvements in architecture and 
training (fig. S9). 


Model training procedure 


Each batch had the same number of samples 
from each of the three variant datasets (~33 with 
a batch size of 100). For the language model 
ranks dataset, all 33 samples had to come 
from the same protein. The number of times a 
protein was chosen for a batch was propor- 
tional to the length of the protein. In order to 
make our model robust against protein orien- 
tations, we randomly rotated the protein atomic 
coordinates in 3D before voxelizing a variant. 
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Model evaluation 


We compared performance of our model and 
other models (84) on variants for which all 
models had scores. Deep mutational scanning 
assays were available for 9 human genes: 
Amyloid-beta (102), YAP] (96), MSH2 (98), SYUA 
(101), VKORI1 (97), PTEN (99, 100), BRCAI (104), 
TP53 (103), and_ADRB2 (105). For each assay and 
prediction model, we calculated the absolute 
Spearman rank correlation between prediction 
and assay scores. The UKBB dataset (79, 80) 
contains 42 gene-phenotype pairs which were 
significantly associated by rare variant burden 
testing using all rare missense variants, without 
applying missense pathogenicity prioritiza- 
tion. The evaluation was the same as with 
DMS assays, except that correlations were cal- 
culated from the quantitative phenotypes of 
individuals carrying the variant, instead of 
the assay score for the variant. For ClinVar 
(4), we filtered to high-quality 2-star variants 
and evaluated model performance by calcu- 
lating per-gene area under the receiver op- 
erating characteristic curve (AUC). For the 
rare disease cohorts, we collected de novo mis- 
sense mutations from patients with devel- 
opmental disorders (85-87), autism spectrum 
disorders (88-94) or congenital heart disor- 
ders (95). For all three datasets, we compared 
against DNMs from healthy controls (88-93). 
We applied the Mann-Whitney U test to mea- 
sure how well each model’s prediction scores 
could distinguish patient variants from con- 
trol variants. 
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INTRODUCTION: Genome-wide association studies 
(GWASs) have identified thousands of common 
genetic variants that are predictive of common 
disease susceptibility, but these variants indi- 
vidually have mild effects on disease owing to 
the effects of natural selection. By contrast, 
rare genetic variants can have large effects on 
common disease risk, but their use in genetic 
risk prediction has been limited to date owing 
to the difficulty of distinguishing pathogenic 
from benign variants and estimating the mag- 
nitude of their effects. 


RATIONALE: PrimateAI-3D is a three-dimensional 
convolutional neural network for missense 
variant-effect prediction, which was trained 
with common genetic variants from the pop- 
ulation sequencing of 233 primate species. By 
applying this method to estimate the patho- 
genicity of rare coding variants in 454,712 UK 
Biobank individuals, we aimed to improve rare- 
variant association tests and genetic risk predic- 
tion for common diseases and complex traits. 
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RESULTS: We performed rare-variant burden 
tests for 90 well-powered, clinically relevant 
phenotypes in the UK Biobank exome dataset. 
Stratifying missense variants with PrimateAI-3D 
greatly improved gene discovery, revealing 73% 
more significant gene-phenotype associations 
(false discovery rate <0.05) compared with not 
using PrimateAI-3D. When benchmarked against 
prior studies, gene-phenotype pairs identified 
with our method were better supported by or- 
thogonal genetic evidence from GWAS and 
genes from related Mendelian disorders. In 
addition, PrimateAI-3D scores showed the strong- 
est correlation among existing variant interpre- 
tation algorithms for predicting the quantitative 
effects of rare variants on continuous clinical 
phenotypes. 

Having validated our method for finding gene- 
phenotype relationships, we next constructed 
a rare-variant polygenic risk score (PRS) mod- 
el by combining the rare-variant genes for 
each phenotype, weighting variants by their 
PrimateAI-3D prediction score and the direc- 
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Polygenic contribution of rare genetic variants to complex human traits, shown for serum cholesterol as 
a representative example. (Left) Rare-variant burden tests capture the direction and effect sizes of genes in 
known lipid biosynthesis pathways. (Top right) When used in a rare-variant polygenic risk score, individuals 

at opposite ends of the PRS separate into high- and low-cholesterol groups. (Bottom right) Rare variants in these 
genes have larger effects compared with common variants identified by GWAS and are strongly predictive of 


individuals who are phenotypic outliers. 
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tion and effect size of each associated gene. ches 
comparison, we constructed common-vali—~ 
PRS models and evaluated the performance of 
the two models for genetic risk prediction in a 
withheld-test subset of the cohort. Although 
common variants better explained overall pop- 
ulation variance, rare-variant PRSs had more 
power at the ends of the distribution to identify 
individuals at the greatest risk for disease, and 
thus may be more relevant for population ge- 
netic screening and risk management. By con- 
trast to common-variant PRS models derived 
from European populations that show poor 
generalization to non-Europeans, rare-variant 
PRSs were substantially more portable to dif- 
ferent cohorts and ancestry groups that were 
not seen during model training. Moreover, be- 
cause they incorporate orthogonal informa- 
tion from nonoverlapping sets of variants, we 
combined rare- and common-variant PRS mod- 
els into a unified model and observed further 
improvement in genetic risk prediction for 
common diseases. 

To understand the extent by which rare- 
variant PRSs can be expected to improve with 
increases in discovery cohort size, we repeated 
our analyses in down-sampled subsets of the 
UK Biobank cohort. We found that the number 
of genes contributing to the rare-variant PRS 
increased linearly, with no signs of plateauing 
at a half-million exomes. Newly discovered rare- 
variant genes were strongly enriched at GWAS 
loci, forming allelic series with effect sizes 
that were ~10-fold larger on average than the 
respective common GWAS variant. Among 
well-powered GWAS loci that could be un- 
ambiguously assigned to a single gene, the 
majority showed subthreshold signal on the 
rare-variant burden test, indicating that rare 
penetrant variants exist at a large fraction of 
GWAS loci and can be incorporated into the 
rare-variant PRS with further advances in co- 
hort size and variant effect prediction. 


CONCLUSION: Understanding the impact of rare 
variants in common diseases is of prime interest 
for both precision medicine and the discovery of 
drug targets. By leveraging advances in variant 
effect prediction, we have demonstrated major 
improvements in rare-variant burden testing 
and genetic risk prediction. Notably, we ob- 
served that nearly all individuals carried at 
least one rare penetrant variant for the pheno- 
types we examined, demonstrating the utility of 
personal genome sequencing for otherwise 
healthy individuals in the general population. 
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We examined 454,712 exomes for genes associated with a wide spectrum of complex traits and common 
diseases and observed that rare, penetrant mutations in genes implicated by genome-wide association 
studies confer ~10-fold larger effects than common variants in the same genes. Consequently, an 
individual at the phenotypic extreme and at the greatest risk for severe, early-onset disease is better 
identified by a few rare penetrant variants than by the collective action of many common variants with 
weak effects. By combining rare variants across phenotype-associated genes into a unified genetic risk 
model, we demonstrate superior portability across diverse global populations compared with common-variant 
polygenic risk scores, greatly improving the clinical utility of genetic-based risk prediction. 


enome-wide association studies (GWASs) 
have convincingly identified tens of thou- 
sands of common variants that underlie 
complex human traits and diseases (J), 
although several key challenges remain. 
First, pinpointing which genes these predom- 
inately noncoding variants affect is nontrivial, 
hindering biological insight into disease mech- 
anisms. Second, individual common variants 
have modest effects on disease risk, which re- 
sults in weak aggregate predictors with lim- 
ited clinical utility and portability between 
populations (2-4). In contrast to GWASs, rare 
coding variant studies directly link perturbed 
gene function to specific phenotypes. For in- 
dividuals with cancer or rare genetic diseases, 
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analysis of whole-exome sequencing (WES) 
routinely uncovers rare, highly penetrant var- 
iants that can substantially alter the course of 
clinical management (5-8) and drive treat- 
ment decisions (9, 10). However, in the context 
of common diseases, the role of rare coding 
variants has not been established to the same 
extent owing to a lack of methods for accu- 
rately predicting variant function and insuffi- 
cient cohort sizes. 

Recent large-scale genome and exome se- 
quencing studies of the general population 
have revealed that the average person carries 
dozens of potentially deleterious rare variants 
that have arisen through recent germline mu- 
tation (17). These studies provide the opportu- 
nity to move beyond rare genetic disease and 
examine the impact of medium- to large-effect 
rare coding variants on a comprehensive set of 
complex human traits and diseases. In prac- 
tice, individually rare variants are often com- 
bined into burden tests to more powerfully 
discover genes underlying these phenotypes, 
but these tests are limited by our ability to 
distinguish pathogenic from benign variants. 
In this study, we show that our recently de- 
veloped method PrimateAI-3D (72), a three- 
dimensional (3D) convolutional neural network 
trained on common genetic variants from 
233 primate species, accurately quantifies mis- 
sense variant pathogenicity, resulting in im- 
proved gene discovery across 454,712 individuals 
in the UK Biobank (13-15). We then show how 
rare variants in these genes can be combined 
into a unified genetic risk score, which has dis- 
tinct advantages over common-variant poly- 
genic risk scores, offering a glimpse into the 
potential utility of personal genome sequenc- 
ing for the general population. 


PrimateAl-3D empowers gene discovery 
in rare-variant association tests 
To identify genes underlying complex human 
traits and diseases, we performed rare-variant 
burden tests for 90 well-powered, nonredun- 
dant clinical and quantitative phenotypes, in- 
cluding both medical diagnoses and commonly 
measured laboratory tests, for 454,712 indi- 
viduals in the UK Biobank who underwent 
WES (tables S1 to $3) (6). Using an allele 
frequency (AF) threshold of 0.1%, we detected 
1841 gene-phenotype associations with loss- 
of-function (LoF) variants, 1510 associations 
with missense variants, and 3035 associations 
combining missense and LoF variants (aver- 
age of 33.7 per phenotype) at a false discovery 
rate (FDR) of 5% (Fig. 1A). When we applied 
PrimateAI-3D (72) to classify pathogenic and 
benign missense variants, we improved gene 
discovery by 73%, identifying 1285 more gene- 
phenotype associations at the same FDR (Fig. 
1A, fig. S1, and table S4). As a negative control, 
we repeated the test considering rare syn- 
onymous variants but detected only 28 gene- 
phenotype associations. Taken together, these 
results show that our rare-variant tests are 
well calibrated and that PrimateAI-3D path- 
ogenicity predictions improve gene discovery. 
We undertook several additional approaches 
to validate our gene-phenotype associations 
and to compare them to prior efforts. First, we 
investigated the strength of support from 
common-variant studies for the gene-phenotype 
pairs identified by our approach. After per- 
forming matched GWASs for the 90 pheno- 
types (table S5) (J6), we observed that 70% 
of the 3035 gene-phenotype pairs had a sig- 
nificant GWAS variant (P < 5 x 10-8) within 
1 megabase of the transcription start site. 
Next, we compared our results to a recent rare- 
variant association study in the same UK 
Biobank cohort (17) (Fig. 1B). Backman et al. 
used a burden test which included all LoF 
variants but permitted only missense variants 
predicted to be deleterious by five commonly 
used missense pathogenicity classifiers (78). 
For matched phenotypes and significance 
thresholds (J6), we identified 23% more gene- 
phenotype pairs (table S6). Gene-phenotype 
pairs identified exclusively in the present study 
were more enriched for genes implicated by 
matching GWASs and overlapped more with 
genes in related Mendelian diseases (Fig. 1C 
and table S7), which supports their relevance to 
complex-trait biology. Third, we benchmarked 
PrimateAI-3D against 15 other pathogenicity 
classifiers by integrating them into our burden 
testing pipeline. Again, gene-phenotype pairs 
detected exclusively by PrimateAI-3D had con- 
sistently higher enrichments for GWAS genes 
for the same trait compared with any other 
method (fig. S2). Finally, we assessed how well 
each classifier could predict the effect size of 
individual variants on phenotype across 62 
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Fig. 1. PrimateAl-3D identifies rare deleterious variants that affect disease 
severity and age of onset. (A) Total number of significant gene-phenotype 
associations (FDR < 5%) identified across 90 phenotypes for rare-variant burden 
tests with different inclusion criteria for variants. As a negative control, the number 
of significant genotype-phenotype associations for a burden test with only 
synonymous variants is also shown. (B) Comparison of the current study with a recent 
study of rare variants in the UK Biobank (17) on the number of gene-phenotype 
associations detected exclusively by one or both studies for the same traits and 
matched significance thresholds. (C) Comparison of rare-variant genes discovered in 
this study versus the previous study (17) with orthogonal genetic evidence. (Left) Fold 
enrichment of rare-variant genes at common-variant GWAS loci, matched for the 
same phenotypes. (Right) Percentage of rare-variant genes overlapping with OMIM 
genes matched for related phenotypes. (D) Performance of different variant 
pathogenicity classifiers (see methods) at predicting variant effects on quantitative 
phenotypes. Spearman correlations between pathogenicity scores and phenotype 
values on a set of 62 gene-phenotype pairs are shown. The phenotypic correlation 
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between individuals carrying an identical missense variant is shown in black as an 
upper bound for classifier performance. Dots and error bars represent mean + 95% 
confidence interval. (E) (Top) Positive correlation of LDL cholesterol concentrations 
(y axis) with PrimateAl-3D scores (x axis) for rare missense variants in LDLR. 
(Bottom) PrimateAl-3D score is predictive of age of onset for dyslipidemia in carriers 
of rare missense variants in LDLR. (F) (Top) Negative correlation of LDL cholesterol 
concentrations with PrimateAl-3D scores for rare missense variants in PCSK9, a 
down-regulator of LDLR. (Bottom) LDL cholesterol concentrations increase with age 
at a similar rate regardless of carrier status, but carriers of prioritized rare variants 
have lower LDL concentrations across all ages. (G) (Top) Positive correlation of 
HbAlc concentrations with PrimateAl-3D scores for rare missense variants in GCK. 
(Bottom) HbAlc concentrations increase with age at a similar rate regardless of 
carrier status, but carriers of rare deleterious variants reach prediabetic thresholds 
earlier in their lives on average. Deleterious and benign missense variants are 
defined as variants with PrimateAl-3D score >0.5 and <0.5, respectively. For (E), (F), 
and (G), red, blue, or yellow lines show regression models fitted to the data. 


2 of 10 


SPECIAL SECTION 


gene-phenotype pairs detected without vari- 
ant prioritization (table S8) (6) and again 
observed that PrimateAI-3D outperformed 
all other methods (median Wilcoxon P = 8 x 
10~’) (Fig. 1D and fig. $3). 

Having comprehensively validated our use 
of PrimateAI-3D for rare-variant burden test- 
ing, we explored the correlations we observed 
between PrimateAI-3D scores, clinical labora- 
tory measurements, and ages of onset for com- 
mon diseases. In general, we observed a linear 
relationship with the quantitative measure- 
ments and an inverse correlation with age of 
disease onset (table S9). We focus on the ex- 
amples of LDLR and PCSK9 with low-density 
lipoprotein (LDL) cholesterol levels and GCK 
with glycated hemoglobin Alc (HbAIc) to dem- 
onstrate these general findings (Fig. 1, E to G). 
Overall, 1307 individuals (0.3%) carried rare, 
potentially deleterious missense variants in 
the LDLR gene in which pathogenic muta- 
tions can cause familial hypercholesterolemia 
and early-onset cardiovascular disease (19, 20). 
PrimateAI-3D scores of missense variants in 
LDLR were significantly correlated with LDL 
levels (Spearman p = 0.50, P = 8 x 10°*®) (6). 
Individuals with variants that had scores near O 
had LDL cholesterol levels indistinguishable 
from noncarriers, whereas those with scores 
near 1 had elevated LDL cholesterol levels sim- 
ilar to LoF variant carriers (Fig. 1E, upper panel). 
Among individuals who received a clinical 
diagnosis of dyslipidemia, PrimateAI-3D scores 
correlated inversely with age of diagnosis 
(Spearman p = -0.35, P = 3 x 10 ~). The most 
deleterious missense variants advanced age of 
disease onset by ~15 years, similar to that ob- 
served for LoF carriers (Fig. 1E, lower panel). 

We next examined rare variants in the PCSK9 
gene, a target of cholesterol-lowering medi- 
cations (27). Rare missense variants with high 
PrimateAI-3D scores in PCSK9 were corre- 
lated with decreased LDL cholesterol levels 
(Spearman p = -0.32, P = 3 x 10°) and acted 
in the opposite direction of deleterious LDLR 
variants (Fig. IF, upper panel). LDL choles- 
terol levels increased with age at a similar rate 
(0.2 mmol/liter per decade of normal aging) 
regardless of PCSK9 carrier status, but indi- 
viduals carrying prioritized rare variants in 
PCSK9 had an average of 0.6 mmol/liter-lower 
LDL cholesterol levels at any given age (Fig. 1F, 
lower panel). Consequently, fewer of these car- 
riers had moderate-to-severe hypercholesterolemia 
(LDL cholesterol > 4.1 mmol/liter or 160 mg/dl) 
or elevated cardiovascular disease risk (22), 
whereas those that did manifested these symp- 
toms later in life. 

We also observed similar relationships be- 
tween rare deleterious variants in GCK and 
HbA\Ic, a proxy for blood glucose levels and a 
diagnostic laboratory marker for type 2 dia- 
betes (prediabetes HbAIc > 42 mmol/mol, dia- 
betes HbAIc > 48 mmol/mol) (Fig. 1G) (23). 
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Analogous to LDL cholesterol, HbAlc levels 
increased with age, matching the steep rise 
of diabetes prevalence with age observed in 
epidemiological studies (24). Rare deleteri- 
ous variants in GCK elevated HbAIc levels by 
an average of 5.1 mmol/mol relative to benign 
variant carriers and noncarriers, which was 
4.6-fold higher than the average rise in HbAIc 
levels per decade of normal aging. Correspond- 
ingly, this increased the fraction of individuals 
with diabetes between ages 40 and 50 from 
3.8% to 24.8% (6.6-fold increase) for carriers of 
rare deleterious variants. Our results across clin- 
ically relevant phenotypes such as LDL choles- 
terol and HbAlc demonstrate the utility of 
PrimateAI-3D to distinguish pathogenic from 
benign variants and highlight the capacity of rare 
high-penetrance variants to accelerate or delay 
the age of onset of common diseases by decades. 


Rare-variant polygenic risk scores identify 
individuals most at risk for common diseases 


Recent exponential human population growth 
has created an abundance of rare variants 
through naturally occurring mutations with- 
out providing adequate time for selection to 
remove those with deleterious consequences 
(25, 26). In the UK Biobank cohort, we ob- 
served that each person carries an average of 
2.96 rare deleterious missense variants and 
0.97 rare LoF variants within one or more of 
the genes identified from our burden test. 
Consistent with models of negative selection 
(16, 27, 28), we find that rare variants exerted 
far greater per-allele effects on human pheno- 
types than common variants across a subset of 
893 genes implicated by both rare- and common- 
variant studies, with rare deleterious variants 
having on average an 11.2-fold larger effect than 
common GWAS variants at the same loci (Fig. 
2A and fig. S4). Within each allele frequency 
bin, LoF variants had the highest per-allele ef- 
fects, followed by missense variants (PrimateAI- 
3D > 0.8) and cryptic splice variants (SpliceAI 
score > 0.2) (29). Benign missense (PrimateAI- 
3D < 0.2) and synonymous variants had nearly 
null per-allele effects on phenotype, even as 
singletons. Given the high overall prevalence 
and strong effect sizes of rare deleterious var- 
jants in the predominately healthy UK Biobank 
cohort, we reasoned that a single polygenic score 
combining these variants may effectively iden- 
tify individuals at high risk for complex disease. 

Existing polygenic risk score (PRS) models 
of common disease largely omit rare variants 
because of challenges in interpreting variants 
of uncertain significance (VUS) and estimating 
the magnitude of variant effects (30). Here we 
propose a complementary, rare-variant PRS mod- 
el, based on a weighted sum of rare deleterious 
variants from multiple phenotype-associated 
genes, using PrimateAI-3D for variant effect 
estimation. To construct the model, we first 
split the UK Biobank cohort into training and 


testing subsets and then fit a linear model to 
each phenotype on the rare variants (AF < 0.1%) 
in associated genes, weighted by PrimateAI- 
3D-predicted effect size (table S10) (16). For 
comparison, we also constructed common- 
variant (AF > 1%) PRS models by performing 
GWAS on the training dataset and applying 
the method of clumping and thresholding 
(table S11) (31). 

We illustrate the components of the rare- 
variant PRS model using total cholesterol lev- 
els as a representative example and show that 
it identifies the complex network of genes, cell 
types, and pathways that underpin lipid me- 
tabolism (Fig. 2B). Rare deleterious variants in 
the 31 associated genes that contribute to the 
rare-variant PRS model shifted cholesterol lev- 
els by ~0.38 mmol/liter on average, 10-fold the 
average effect size of the 563 variants in the 
common-variant PRS model (0.040 mmol/liter) 
(Table 1). Out of these 31 genes, 25 were pre- 
viously known to play central roles in lipid ho- 
meostasis (32): from absorption of cholesterol 
through intestinal enterocytes (ABCGS5) (33), to 
regulation of serum LDL concentrations (PCSK9) 
(34), to comprising key components of lipo- 
proteins (APOB) (35), to lipid scavenging in 
macrophages (STABI) (36). Beyond identify- 
ing genes pertinent to cholesterol metabolism, 
the direction of effect for these rare deleterious 
variants was consistent with each gene’s known 
role in the pathway. Notably, many of the genes 
that produce downregulatory effects on choles- 
terol levels are therapeutic targets that offer al- 
ternatives to statin-based cholesterol reduction 
for cardiovascular disease, such as PCSK9 and 
NPCIL1 inhibitors (37, 38). Whereas the average 
chance of an individual carrying a rare dele- 
terious variant for any given gene was only 0.4%, 
when summed across all 31 genes, one in eight 
individuals carried a rare, high-penetrance var- 
iant for cholesterol. 

We sought to evaluate the predictive power 
of the rare-variant PRS and the correspond- 
ing common-variant PRS, as well as a combi- 
nation of the two methods, on the 10% of UK 
Biobank individuals that had been withheld 
for testing. Across 78 quantitative phenotypes, 
the unified PRS performed best with an aver- 
age Pearson correlation of 0.307 (Fig. 2C and 
fig. S5), compared with 0.058 and 0.303 for the 
rare-variant and common-variant PRSs, re- 
spectively. Consistent with the correlations, the 
average phenotypic variance explained was 
10.4, 0.4, and 10.1%, respectively. We also eval- 
uated rare-variant PRS models constructed 
using 15 other variant pathogenicity classifiers 
and observed that PRSs based on PrimateAI- 
3D outperformed all other methods (Fig. 2D), 
underscoring the importance of accurate path- 
ogenicity prediction to rare-variant PRS per- 
formance. Overall, these observations are 
consistent with those of previous studies that 
have demonstrated that, in aggregate, rare 
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Fig. 2. Comparison of polygenic risk 
scores (PRSs) from common and rare 
variants. (A) Relationship between variant 
effect size and allele frequency for different 
pathogenicity classes of variants. Synony- 
mous variants are shown as negative 
controls. Dot sizes are proportional to the 
cube root of the number of variants in each 
group. Regression fits between the allelic 
effect size and minor allele frequency 

are represented by curves for each patho- 
genicity class, calculated with the equation 
B = o[2p(1 — p)| 7”, where 8 is the 
per-allele effect, p is the minor allele 
frequency, and o and a are parameters for 
selective constraint. (B) Illustration of the 
cholesterol pathway. Genes in the rare- 
variant PRS model are superimposed. For 
each gene, values indicate effect sizes in 
standardized units (see materials and 
methods), and triangles indicate direction of 
effect. (C) Comparison of the performance 
of rare-variant PRS, common-variant PRS, 
and a unified PRS across 78 phenotypes 
in the withheld UK Biobank test set. 
Pearson correlations between PRS predictions 
and phenotypes are shown. (D) Compari- 
son of rare-variant PRSs constructed 

with different pathogenicity classifiers (see 
methods). Mean absolute Pearson correla- 
tions between PRS and phenotypes are 
shown. Dots and error bars represent mean 
+ 95% confidence intervals. (E) Enrichment 
of outlier PRS scores in individuals who 
are phenotype outliers. Phenotype-outlier 
individuals were defined as exceeding a 
certain z-score cutoff (x axis), and the y axis 
shows the enrichment of outlier PRS scores 
in phenotype-outlier individuals versus 

the baseline population, aggregated across 
78 phenotypes. (F) Comparison of the 
performance of common-variant PRS (x axis) 
versus rare-variant PRS (y axis) at identifying 
individuals at the 90", 99", and 99.9" 
percentiles (left, middle, and right graphs) for 
78 quantitative phenotypes. Dashed horizon- 
tal and vertical lines represent Bonferroni 
corrected significance thresholds. Lines of 
equivalence are represented by dashed 
diagonal red lines. (G) Number of individuals 
at high clinical risk for type 2 diabetes 

(left) and dyslipidemia (right), identified by 
rare- and common-variant PRSs at varying 
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risk thresholds (x axis). Rare-variant PRSs identified more individuals at higher risk (>3.8 higher odds for type 2 diabetes, and >4.4 higher odds for dyslipidemia) 


than common-variant PRSs. 


variants explain less genetic heritability than 


common variants (39). 


Although rare-variant PRSs underperformed 
for average phenotype predictions, we rea- 
soned that they may outperform common- 
variant PRSs for identifying individuals at 
phenotypic extremes, which is more relevant 
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for clinical screening and risk management. 
Indeed, individuals with an outlier pheno- 
type (g-score = 3) were 10-fold more likely than 


the overall population to 


PRS score in the O.1st or 99.9th percentile, 
compared with threefold for common-variant 


PRS (P = 0.0026) (Fig. 2E 


have a rare-variant 


and fig. S6). Across 


78 phenotypes, rare-variant PRSs significantly 
outperformed common-variant PRSs at iden- 
tifying individuals with outlier phenotypes at 
the 99.9th percentile (P = 0.0032), had com- 
parable performance at the 99th percentile 
(difference not significant), and underperformed 
at the 90th percentile (P = 5.2 x 10~’) (Fig. 2F 
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Table 1. Comparison of effect sizes and frequencies for common PRS variants and rare PRS genes used for normalized cholesterol concentrations. 
Chrom., chromosome; freq., frequency. 
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and fig. S7). Empirically, the prevalence of many 
complex human diseases is below 1%, includ- 
ing Parkinson’s disease (0.3%) (40), multiple 
sclerosis (0.3%) (42), myocardial infarction before 
age 40 (0.6%) (42), and type 1 diabetes (0.2%) 
(43), which supports the relevance of these 
outlier phenotype thresholds for evaluating 
clinical risk prediction models. 

For two diseases, type 2 diabetes and dys- 
lipidemia, we evaluated the ability of common 
and rare PRS models to identify individu- 
als exceeding predefined diagnostic clinical 
thresholds (HbAIc > 42 mmol/mol and LDL 
cholesterol > 4.9 mmol/liter respectively) (Fig. 
2G). Up until approximately fourfold-increased 
odds of disease, the common-variant PRS iden- 
tified more at-risk individuals, whereas after 
this threshold, the rare-variant PRS overtook 
the common-variant PRS. Because the rare- 
and common-variant PRS models use nonover- 
lapping sets of variants, combining them into a 
unified model enables the identification of sig- 
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nificantly more individuals at high-disease risk 
(odds ratio = 4x) than common-variant PRSs 
alone (type 2 diabetes, 1912 versus 542, P = 1.4 x 
10". dyslipidemia, 7858 versus 6306, P = 1.2 x 
10°’). Taken together, these findings suggest 
that incorporating rare variants into PRSs can 
outperform common-variant PRSs for identi- 
fying outlier individuals (30, 44) who are most 
likely to require treatment or to suffer severe, 
early-onset manifestations of disease and for 
whom preventive screening would be most im- 
pactful (45, 46). Moreover, the ability to point to 
a single penetrant variant as the primary cause 
of the phenotype may increase the potential 
clinical actionability of rare deleterious var- 
iants with respect to prognosis, management, 
and therapeutic interventions (47). 


Portability of rare-variant PRSs and validation 
in an independent, multiancestry cohort 


Common-variant PRS models derived from 
European populations have poor portability 


in non-European populations, which may con- 
tribute to future health disparities once adopted 
into clinical practice (4). Even when applied to 
populations with similar ancestry, common- 
variant PRSs have decreased performance 
owing to differences between the cohorts used 
for training and testing (48, 49). We thus set 
out to evaluate the robustness of our rare- 
variant PRSs across independent cohorts and 
ancestries. We first applied 16 rare-variant 
PRS models, which had been trained on UK 
Biobank European-ancestry individuals, to 
predict quantitative phenotypes in 20,708 
European individuals from the Massachusetts 
General Brigham Biobank (MGB; table S12) 
(50). Across 16 phenotypes, the average 
predictive performance of the rare-variant 
PRS model was similar in the two cohorts 
(Pearson’s 7 = 0.53), with a median phenotype 
correlation of 0.078 between the rare-variant 
PRS and the UK Biobank withheld-test co- 
hort, compared with 0.084 for the MGB cohort 
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Fig. 3. Validation of rare-variant PRS performance in diverse human popula- 
tions. (A) Performance of rare- and common-variant PRSs derived from UK Biobank 
Europeans (EUR), measured in the MGB cohort (left) and in UK Biobank non- 
Europeans (non-EUR) stratified by ancestry (right) (AFR, African; EAS, East Asian; 
SAS, South Asian). Performance is shown relative to held-out European individuals in 
the UK Biobank. P-values indicate whether the difference in performance versus 
held-out Europeans is significant. (B) Mean phenotype distance between UK Biobank 
EUR (x axis) and UK Biobank non-EUR (y axis) individuals is shown for 52 matching 
traits. The phenotypic distance is calculated by comparing individuals with low 
(<0.5%) and high (>99.5%) rare-variant PRS percentiles. The Pearson correlation is 
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singletons. P values are shown for comparisons across ancestries with PrimateAl-3D. 
The performance of other variant classifiers is also shown for context. 


(Fig. 3A). Notably, the rare-variant PRS mod- 
els achieved approximately equal perfor- 
mance in the two cohorts despite 43% of 
the rare deleterious variants in the MGB co- 
hort never appearing in the UK Biobank co- 
hort that was used for model training. Thus, 
unlike common-variant PRSs, rare-variant 
PRSs appear largely portable across cohorts 
with similar ancestry. 

We next evaluated the performance of our 
rare- and common-variant PRS models, which 
had been trained only on individuals of Euro- 
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pean ancestry, in individuals of non-European 
ancestry from the UK Biobank and MGB. As a 
control, we ensured that the number of var- 
iants used per person in the rare-variant PRS 
was closely matched for different ancestries by 
applying ancestry-specific allele frequency fil- 
ters (AF < 0.1%) (fig. S8) and verified that the 
resulting PRS distributions were similar across 
ancestries (fig. S9). Consistent with previous 
reports, the median common-variant PRS cor- 
relation with phenotype was 84% lower in in- 
dividuals with African ancestry (P = 2.1 x 10), 


62% lower in individuals with East Asian an- 
cestry (P = 3.4 x 10°-*°), and 51% lower in in- 
dividuals with South Asian ancestry (P = 2.5 x 
10 **) relative to the correlation in individuals 
with European ancestry (Fig. 3A). By contrast, 
the rare-variant PRS correlation was substan- 
tially more portable with smaller reductions 
in median correlation of 54%, 14%, and 23%, 
respectively. To assess the portability of the 
rare-variant PRS on a more clinically rele- 
vant task, we selected individuals with PRS 
scores at the upper and lower ends of the 
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phenotype distribution (top or bottom 0.5%) 
and observed that the average phenotype dif- 
ferences between the two groups were similar 
for Europeans and non-Europeans in both the 
UK Biobank withheld-test cohort (Pearson’s 
r = 0.85; Fig. 3B) and the MGB cohort (Pearson’s 
r = 0.88; fig. S10). Overall, rare-variant PRS 
models trained in Europeans performed bet- 
ter when tested in non-Europeans than Euro- 
peans for 14 out of 52 phenotypes, compared 
with the common-variant PRS models, which 
performed worse when tested in non-Europeans 
for all 52 phenotypes (Fig. 3C). 

Although rare-variant PRSs appear to gen- 
eralize better across ancestries than common- 
variant PRSs, their average performance 
still decreases in non-European populations. 
However, this appears to be distinct from 
the portability issues experienced by the 
common-variant PRS, where causal-variant 
identification remains difficult because of 
linkage disequilibrium. We hypothesized that 
the current European bias is due primarily 
to more accurate allele frequency estimates 
within the more numerous European indi- 
viduals in the cohort and in current popula- 
tion databases, resulting in the inadvertent 
inclusion of common non-European variants 
into the rare-variant PRS that dilute its per- 
formance. To test this hypothesis, we restricted 
our evaluation to ultrarare variants (Seen only 
once in the UK Biobank and absent from the 
TOPMed allele frequency database) to mini- 
mize common-variant leakage. We found that 
PrimateAI-3D variant-effect size predictions 
were equally accurate in European and non- 
European ultrarare variants (difference not 
significant; Fig. 3D) but were significantly 
less accurate for non-European variants at 
the default allele frequency threshold of 0.1% 
(P =1.5 x 10 * with PrimateAI-3D). As further 
indication that these issues are independent 
of variant effect prediction, we show that rare- 
variant PRSs derived with only LoF variants 
(without PrimateAI-3D) displayed similarly 
decreased performance in non-European indi- 
viduals (fig. S11A), and that the European bias 
could be reduced by using L, regularization to 
limit overfitting (fig. SIIB). Similar challenges 
have been reported for rare genetic disease 
diagnosis in non-European populations (51, 52), 
where inaccurate allele frequency estimates 
make it difficult to preclude ancestry-specific 
common variants as potential causes of dis- 
ease. Therefore, as population allele frequency 
panels become more accurate and globally in- 
clusive, we expect that the portability of rare- 
variant PRSs will continue to improve. 


The convergence of common- and rare-variant 
genes forecasts future improvements in 
rare-variant PRSs 


Looking forward, we explored how much the 
performance of rare-variant PRS approaches 
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is expected to improve as exome sample sizes 
increase, focusing first on our ability to iden- 
tify additional exome-wide significant genes 
(FDR <5%). We performed association tests 
in down-sampled subsets of the UK Biobank 
cohort and observed that the number of sig- 
nificant associations increased linearly with 
sample size for both rare-variant burden tests 
(FDR <5%) and common-variant GWAS loci 
(P <5 x 10 ®) (Fig. 4A and fig. S12). On average, 
PrimateAI-3D enabled discovery of the same 
number of exome-wide significant genes using 
1.8-fold-smaller cohort sizes compared with 
when missense prioritization was not applied. 
Consistent with the improved detection of 
phenotype-associated genes, we observed a 
linear increase in the number of variants car- 
ried by each individual that could be included 
in the rare-variant PRS model (Fig. 4B). At 
the full cohort size, we found that 97% of in- 
dividuals carried a rare penetrant variant in 
one or more of the associated genes for the 
90 clinical and quantitative phenotypes in the 
study (fig. S13). Although effect sizes were lower 
in newly identified genes (Fig. 4C), rare-variant 
PRS performance improved steadily, with each 
doubling of discovery cohort size correspond- 
ing to an 88% improvement in variance ex- 
plained (Fig. 4D and fig. S14). 

Our forecasting analyses suggest that rare- 
variant PRSs will continue to meaningfully 
improve as cohort sizes increase, with newly 
discovered genes preferentially enriched at 
GWAS loci (Fig. 4E), consistent with recent 
work showing convergent biological pathways 
behind both rare- and common-variant herita- 
bility (39). The observed overlap of common- 
variant GWAS hits and rare-variant burden 
test genes was highly phenotype specific (Fig. 
4F and figs. S15 and S16) and was not ex- 
plained by linkage disequilibrium, because we 
regressed out the effects of significant GWAS 
variants and population structure before ap- 
plying the rare-variant burden tests. Focusing 
on a subset of well-powered GWAS loci that 
could be unambiguously mapped to a single 
protein-coding gene (J6), we found that 64% of 
common-variant GWAS genes showed signif- 
icant association in the rare-variant burden 
test (P < 0.05; Fig. 4G). The fraction of genes 
with rare-variant signal declined for weaker 
GWAS hits (P = 3 x 10 ®), as well as for genes 
under strong evolutionary selection (P = 5 x 
10 *) (53), reflecting reduced statistical power 
to detect enrichments in genes that either have 
weak phenotypic effects, or that have been 
depleted of deleterious variants by selective 
constraint. Similarly, we observed that shorter 
genes, with consequently fewer variants, were 
also less likely to be significant in the rare- 
variant burden test (P = 7 x 10~°). Although we 
found that only 186 (6%) out of 3097 unam- 
biguously GWAS-implicated genes reached the 
stringent exome-wide significance threshold 


for inclusion in the rare-variant PRS (FDR < 
5%), 625 (20%) were nominally significant on 
the burden test at a P-value threshold of < 0.05, 
indicating that rare-variant associations are 
likely to be discovered at these genes with larger 
cohort sizes. Our empirical studies of the con- 
vergence of common- and rare-variant associ- 
ations suggest that allelic series underlie most 
of the genes implicated in human pathophys- 
iology and can be leveraged in ever-growing se- 
quencing cohorts to improve rare-variant PRS 
performance. 


Discussion 


Understanding the role of rare penetrant var- 
iants in common diseases is of prime interest 
to both precision medicine (5-7) and targeted 
drug development (21, 54, 55). In this study, we 
leverage PrimateAI-3D’s state-of-the-art pre- 
dictions to model the quantitative effects of 
each variant on multiple phenotypes, uncov- 
ering the role played by rare penetrant variants 
in common human diseases and complex traits. 
We demonstrate the complementary utility of 
common and rare variants for predicting the 
risk of human diseases, observing that com- 
mon variants explain a higher proportion of 
total population variance, whereas rare var- 
iants more readily identify outlier individu- 
als at the greatest risk for severe, early-onset 
disease (45, 46). Our results establish that the 
personal genome of an otherwise healthy indi- 
vidual is not quiescent with limited actionable 
potential (56) but instead carries a substantial 
burden of rare consequential variants, the clin- 
ical utility of which will be more fully realized 
as variant interpretation improves and discov- 
ery cohort sizes increase. 

At present, the two greatest barriers to the 
clinical adoption of common-variant PRS mod- 
els for use in precision medicine are their lim- 
ited generalizability between populations with 
different ancestries and their weak discrimina- 
tory capability to identify individuals at high 
risk for disease (57). Specifically, the inclusion 
of predominately noncoding variants with 
small effects that are noncausal, but disease- 
associated owing to linkage disequilibrium, 
substantially impairs common-variant PRS 
performance (58, 59). In comparison, our rare- 
variant PRS models are anchored on PrimateAI- 
3D’s predictions of missense variant-effect size 
and are largely uninfluenced by the effects of 
ancestry, because the PrimateAI-3D model 
was derived from common variants in 236 
species of nonhuman primates. This gives rare- 
variant PRS models an advantage over common- 
variant PRS models at generalizing to cohorts 
and human populations that were not seen 
during training, providing more globally equi- 
table health outcomes than current genetic 
studies, which are predominantly European. 
Ultimately, rare-variant PRSs can be combined 
with common-variant PRSs into a unified risk 
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Fig. 4. Forecasting the growth of rare- 
variant associations with increasing 
cohort size. (A) Number of significant 
(FDR <0.05) genes identified per 
phenotype with rare-variant burden tests 
as a function of the discovery cohort size 
in thousands of individuals. Missense 
prioritization with PrimateAl-3D substan- 
tially increased the number of genes 
detected at all cohort sizes. Dots and 
bars represent mean + standard error. D 
(B) Number of rare deleterious variants 
identified per individual as a function of the 
discovery cohort size. (C) Average 

per-variant absolute effect size for newly 
associated genes (FDR < 0.05) at each 
discovery cohort size. The fit from the 
regression y = a/x + b is shown. Dots and 
error bars represent mean + standard 

error. (D) Rare-variant PRS performance 
increases with increasing discovery cohort F 
size. Median correlation between the 

PRSs and the phenotype is shown on the 

y axis. The number of genes included in 

the PRS is represented by the size of each 
point. (E) Venn diagram showing the 

overlap of rare-variant genes with common- 
variant GWAS loci as a function of discovery 
cohort size. (F) A nonsymmetrical 

heatmap showing the phenotype-specific 
overlap of common- and rare-variant 
associations. Each point shows the 

statistical significance of the overlap 

between common-variant GWAS genes 
associated with the x axis phenotype and 
rare-variant genes associated with the 

y axis phenotype. The size of the points 
represents the magnitude of the enrichment, 
whereas the color represents the P value. 

(G) Percentage of unambiguously 

mapped GWAS genes with rare-variant 
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model to significantly improve the identifica- 
tion of individuals from the general population 
who are at increased risk for common diseases. 

Although the rare-variant PRS models pre- 
sented in this work show promise for accu- 
rate identification of high-risk individuals 
across diverse human populations, our study 
has several limitations. At present, rare-variant 
PRS models have limited power; we are only 


capable of robustly estimating variant effects 
for well-powered genes, finding 217 GWAS loci 
but only 34 rare-variant genes on average per 
trait. We empirically forecast that the exact 
causal genes underlying most of these GWAS 
loci will be uncovered by rare-variant studies 
with larger cohort sizes and advances in var- 
iant interpretation algorithms (60). Second, 
although interpretation of variants of uncer- 
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tain significance remains a challenge, recent 
advances that apply deep learning (12, 61), 
high-throughput experimental assays (62), and 
variant information from closely related pri- 
mate species (63) have each demonstrated 
promise toward solving variant interpretation 
on a genome-wide scale. Third, although we 
observed improved portability across ances- 
tries for rare-variant polygenic prediction, more 
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accurate allele frequency resources for global 
populations will further shrink the discrepan- 
cies in performance across populations. Indeed, 
systematic efforts to catalog rare variation in 
non-European populations are ongoing (64, 65) 
and will likely precede well-powered common- 
variant GWAS studies in diverse global popu- 
lations (66). Finally, although we only evaluated 
rare-coding or splice-altering variants, improved 
noncoding variant prediction coupled with 
larger sample sizes would likely reveal the 
pervasive phenotypic impacts of rare penetrant 
variants in each person, with transformative 
implications for the utility of clinical whole- 
genome sequencing in the general population. 


Methods summary 
Datasets 


We analyzed data from unrelated individuals 
in the UK Biobank, all of whom had genotypes 
obtained from microarrays and 454,712 of 
whom had genotypes available from exome 
sequencing. The work described in this manu- 
script was approved by the UK Biobank under 
application no. 33751. In addition, we performed 
validation experiments with 20,708 individuals 
from the MGB Biobank. 


Phenotype processing 


Quantitative traits were standardized by in- 
verse rank normal-transformation and adjusted 
for medication usage and further covariates 
including age, sex, ancestry, diet, and others. 
Binary traits were adjusted for age, age”, sex, 
age x Sex, age” x sex and ancestry. 


Common-variant associations 


GWAS were performed with common variants 
(AF > 1%) in individuals of European ancestry 
in the UK Biobank and causal gene sets were 
derived by linkage disequilibrium between 
independent GWAS significant variants (P < 
5 x 10°°) and coding variants, splicing variants 
or expression quantitative trait loci (eQTLs) in 
nearby genes or by proximity with local tran- 
scription start sites. 


Rare-variant associations 


Burden tests were performed with rare var- 
iants (AF < 0.1%) on individuals from all eth- 
nicities by searching for combinations of allele 
frequencies and missense pathogenicity scores 
per gene and further calibrating via permuta- 
tions to maximize significance prior to FDR 
correction. Significant gene-phenotype pairs 
were reported at 5% FDR after correction for 
multiple hypothesis testing across all auto- 
somal protein coding genes in the human ge- 
nome and across all tested traits. Rare-variant 
results generated in this study were compared 
with results from a recent well-powered rare- 
variant analysis in the UK Biobank (/7) by ex- 
amining the overlap of significant genes, along 
with the enrichment of GWAS and clinically 


Fiziev et al., Science 380, eabo1131 (2023) 


2 June 2023 


PRIMATE GENOMES 


relevant genes. Multiple missense classifiers 
were considered for pathogenicity prediction 
in the burden tests, including BayesDel (67), 
CADD (68), ClinPred (69), DEOGEN2, EVE* 
(61), FATHMM-XF (70), M-CAP (77), MetaLR 
(72), MetaSVM (72), MutationAssessor (73), 
Polyphen-2 (74), PrimateAI-3D (12), PROVEAN 
(75), REVEL (76), SIFT (77), and VEST4 (78). 
Scores for the EVE-style variational autoencoder 
(EVE*) were generated by reimplementing 
the method. The different classifiers were com- 
pared via Spearman correlation with the aver- 
age phenotype values of the carriers of each 
qualifying missense variant in high-confidence 
associated gene-phenotype pairs. 


Polygenic risk scores 


PRS models were constructed from GWAS and 
burden test results from training datasets. 
Common-variant (AF >1%) PRS models were 
constructed by applying the method of clump- 
ing and thresholding (37). By contrast, rare- 
variant PRS models were constructed by fitting 
linear models to each phenotype on the rare 
variants (AF < 0.1%) in significantly associ- 
ated genes, weighted by predicted missense 
pathogenicity. A unified PRS model was also 
constructed, which summed the rare- and 
common-variant PRS models per individual. As 
with the burden test results, rare-variant PRS 
performance was evaluated using PrimateAI- 
3D and other classifiers across 78 traits. The 
overlap of individuals at phenotypic and PRS 
extremes was examined to further elucidate 
PRS performance. For two traits, HbAlc and 
LDL cholesterol, clinical risk prediction was 
assessed, since clinically diagnostic thresholds 
could distinguish cases from controls. PRS por- 
tability was assessed in two ways - first between 
cohorts, by applying models constructed in the 
UK Biobank to the MGB Biobank, and sec- 
ond between ancestries, by comparing the per- 
formance between different ancestry groups 
in the UK Biobank. 


Forecasting analysis 


Growth projections of rare- and common-variant 
associations, PRS performance, and overlap of 
significantly associated genes from rare and 
common variants were made from randomly 
down-sampled data ranging from 20% to 100% 
of the whole UK Biobank exome cohort with 
20% increments. Full materials and methods are 
available in the supplementary materials (16). 
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EDITORIAL 


It matters who does science 


cientific research is a social process that occurs 
over time with many minds contributing. But 
the public has been taught that scientific insight 
occurs when old white guys with facial hair get 
hit on the head with an apple or go running out 
of bathtubs shouting “Eureka!” That’s not how it 
works, and it never has been. Rather, scientists 
work in teams, and those teams share findings with oth- 
er scientists who often disagree, and then make more 
refinements. Then those findings are placed in the sci- 
entific record for even more scientists to examine and 
produce further adjustments. Eventually, theories be- 
come knowledge. All along the way, these scientists are 
conspicuously and magnificently human—with all the 
assets and flaws that humans possess. And that means 
that who those individuals are, and the backgrounds 
they bring to their work, have a 
profound influence on the quality 
of the end result. 

It has somehow become a con- 
troversial idea to acknowledge that 
scientists are actual people. For 
some, the notion that scientists are 
subject to human error and frailty 
weakens science in the public eye. 
But scientists shouldn’t be afraid 
to acknowledge their humanity. In- 
dividual scientists are always going 
to make a mistake eventually, and 
the objective truth that they claim to be espousing is al- 
ways going to be revised. When this happens, the public 
understandably loses trust. The solution to this prob- 
lem is doing the hard work of explaining how scientific 
consensus is reached—and that this process corrects for 
the human errors in the long run. 

A raging debate has set in over whether the back- 
grounds and identities of scientists change the out- 
comes of research. One view is that objective truth 
is absolute and therefore not subject to human influ- 
ences. “The science speaks for itself” is usually the 
mantra in this camp. But the history and philosophy 
of science argue strongly to the contrary. For example, 
Charles Darwin made major contributions to the most 
important idea in biology, but his book The Descent of 
Man contained many incorrect assertions about race 
and gender that reflected his adherence to prevalent 
social ideas of his time. Thankfully, evolution didn’t 
become knowledge the day Darwin proposed it, and 
it was refined over the decades by many points of 
view. More recently, pulse oximeters that measure 


"  SClentists 
shouldn't be afraid 


to acknowledge 
their humanity.” 


blood oxygen levels were found to be ineffective for 
dark skin because they were initially developed for 
white patients. These examples—and countless more 
in between—reveal how much work needs to be done 
to strengthen the scientific community and the public 
understanding of the process. 

A monolithic group of scientists will bring many of 
the same preconceived notions to their work. But a 
group of many backgrounds will bring different points 
of view that decrease the chance that one prevailing set 
of views will bias the outcome. This means that scien- 
tific consensus can be reached faster and with greater 
reliability. It also means that the applications and im- 
plications will be more just for all. How is this a threat 
to scientific rigor and the merit of discoveries? Unfor- 
tunately, we’re nowhere close to achieving these goals. 
Science has had enormous trouble 
building a workforce that reflects 
the public it serves. And now, nu- 
merous state governments are try- 
ing to make it more difficult, if not 
impossible, at the public universi- 
ties in their states, and even within 
the scientific community, there are 
efforts to derail the idea that it 
matters who does science. 

The soundbite “trust the sci- 
ence” has been circulating recently. 
This framing is unfortunate. Be- 
cause “the science” in this context is usually a snapshot 
of ideas or facts in a particular moment—and often 
from the perspective of a small number of people (or 
even one person). It would have been better to use a 
phrase like “trust the scientific process,” which would 
imply that science is what we know now, the product 
of the work of many people over time, and principles 
that have reached consensus in the scientific commu- 
nity through established processes of peer review and 
transparent disclosure. 

Scientists should embrace their humanity rather 
than pretending that they are a bunch of automatons 
who instantly reach perfectly objective conclusions. 
That will be more work both in terms of ensuring that 
science represents that humanity and in explaining 
how it all works to the public. But in return, society 
will get better and more just science, and it will al- 
low scientists to immerse themselves in the glorious, 
messy process of always striving for a greater under- 
standing of the truth*. 

-H. Holden Thorp 


*The text was previously posted as a blog at https://www.science.org/content/blog-post/it-matters-who-does-science. 
SSS 


SCIENCE science.org 


2 JUNE 2023 * VOL 380 ISSUE 6648 


H. Holden Thorp 
Editor-in-Chief, 
Science journals. 
hthorp@aaas.org; 
@hholdenthorp 


10.1126/science.adi9021 


873 


<a 


Ss 
SSS SSS 


Vy 


IN BRIEF 


Edited by Katie Langin 


WATER POLICY 


U.S. wetland protections curtailed 


n a decision that reduces federal protections for wet- 
lands, the U.S. Supreme Court last week narrowed 
the definition of marshy areas covered by the Clean 
Water Act. A five-justice majority led by Justice 
Samuel Alito ruled the law applies only to wetlands 
that have a “continuous surface connection” to 
nearby regulated waters, rejecting an approach 
currently used by federal agencies that wet- 
lands require only a “significant nexus.” Four 


the new standard is too narrow, ignores the law’s intent 
to protect wetlands “adjacent” to waterways, and will 
undermine efforts to prevent pollution and habitat de- 
struction. Researchers filed amicus briefs in favor of the 
current standard; some estimate the ruling will end fed- 
eral regulation of some 18 million hectares of wetlands, 
or about half of the previously protected area. 
Many scientists blasted the decision, saying it 
ignores the complexity of wetland hydrology. 


Wetlands without 
a surface connection 


justices led by Justice Brett Kavanaugh argued 


Defining Long Covid 


covip-19 | A team of scientists says it 
has nailed down the major symptoms of 
Long Covid, a condition that has disabled 
millions of people who were infected 

with SARS-CoV-2. Analyzing reports from 
about 2000 people with Long Covid, as 
well as more than 7000 without—most of 
whom were also previously infected by 
the coronavirus—the researchers identi- 
fied 12 key symptoms, including brain fog, 
postexertional fatigue, chest pain, and diz- 
ziness. Other studies have reported similar 


findings, but this one, which was published 


last week in The Journal of the American 
Medical Association, aims to develop a 
standardized definition and proposes a 
points system to help doctors accurately 
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to other waters 
will lose protection. 


diagnose the syndrome. The work is part 
of RECOVER, a $1.15 billion Long Covid 
project funded by the U.S. National 
Institutes of Health. 


UAE plans asteroid mission 


PLANETARY SCIENCE | The United Arab 
Emirates (UAE) announced this week it 
will build a spacecraft to explore seven 
asteroids located between Mars and 
Jupiter. Scheduled to launch in 2028, 

this will be the UAE’s second planetary 
mission, following the success of its Hope 


spacecraft, currently in orbit around Mars. 


The new spacecraft, the MBR Explorer— 
named after Dubai’s ruler Sheikh 
Mohammed bin Rashid Al Maktoum—will 
end its tour in 2034 by deploying a small 


President Joe Biden said it “defies the science.” 


lander at Justitia, an odd reddish asteroid 
that may be covered in organic substances. 
The VAE is building the MBR Explorer in 
collaboration with planetary scientists at 
the University of Colorado Boulder. 


WHO urges food fortification 


HEALTH POLICY | The World Health 
Organization (WHO) adopted a resolu- 
tion on 29 May urging member countries 
to fortify staple foods with folic acid to 
prevent conditions such as spina bifida, 
which is caused by a lack of the key vita- 
min in the first weeks of pregnancy. The 
resolution, which was adopted unani- 
mously, noted that the benefits of folic 
acid fortification are backed by scientific 
evidence. Only 69 of WHO’s 194 member 
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Stanford University social psychologist Jennifer Eberhardt on NPR. Eberhardt and her colleagues 
published a study this week examining the fears of Black drivers who were stopped by the police ina U.S. city. 


countries—including Australia, Canada, 
the United States, and Colombia— 
currently mandate folic acid fortification. 
The resolution also calls for countries to 
consider fortifying foods with iodine, zinc, 
calcium, iron, and vitamins A and D to 
prevent conditions such as anemia, blind- 
ness, and rickets. 


X-rays probe lone atoms 


MATERIALS SCIENCE | For a century, 
scientists have studied materials by 
measuring the x-rays they absorb. Now, 
researchers have applied absorption 
spectroscopy to individual atoms, they 
report this week in Nature. The team 
positioned a metal tip a few atoms wide 
less than 1 nanometer above organic mol- 
ecules that contained iron and terbium 
atoms. Then they bathed the sample in 


x-rays, which excited the metals’ electrons. 


When the tip was hovering directly over 
a metal atom, excited electrons popped 
from the atom to the tip while others 
flowed in from the gold surface below to 
replace them. By tracking how the flow of 
electrons varied with the x-rays’ energy, 
the team determined the ionization state 
of the metal atoms and how they were 
bonded to other atoms in the molecules. 


COVID-19 study under fire 


RESEARCH INTEGRITY | A study on the 
effects of the malaria drug hydroxychloro- 
quine and the antibiotic azithromycin in 
COVID-19 patients, led by the controver- 
sial French microbiologist Didier Raoult, 
is drawing criticism from medical groups. 
In an open letter published in Le Monde 
on 28 May, 16 French medical societies 
and research organizations slammed it as 
“the largest known unauthorized clini- 
cal trial to date” and urged authorities 

to take action. The 30,000 patient study 
prescribed drugs long after they had been 
shown to be ineffective and didn’t adhere 
to regulations, critics wrote. French 
authorities say the study will be included 
in an ongoing investigation of research 

at the University Hospital Institute 
Méditerranée Infection, where Raoult 
served as director until stepping down in 
September 2022. Raoult denies that the 
study flouted regulations, saying it was a 
retrospective analysis of patient data, not 
a clinical trial. 
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Bad software doomed Moon probe 


LUNAR SCIENCE | The crash of a Japanese 
lunar lander on 25 April resulted from 

a software glitch that miscalculated the 
craft’s altitude, causing it to exhaust its 
fuel and fall roughly 5 kilometers to the 
surface of the Moon, the probe’s devel- 
oper announced last week. The Japanese 
company, called ispace, hoped its Hakuto-R 
Mission 1 would be the first success- 

ful commercial landing on the Moon. 
Company officials said the problem should 
be fixed in time to keep missions planned 
for 2024 and 2025 on schedule. 


MARINE SCIENCE 


Gene-edited crops draw scrutiny 


AGRICULTURE | The U.S. Environmental 
Protection Agency announced on 25 May 
that it will require companies to submit 
data on crops that have been gene edited 
to resist pests before they go to market. 
Until now, the agency required evaluation 
only of transgenic crops, containing genes 
from other organisms. Gene-edited crops 
will be exempt from a detailed review if the 
changes could have been achieved through 
conventional breeding. But the American 
Seed Trade Association says the extra 
paperwork will still be burdensome. 7 


Biodiversity tallied as deep-sea mining looms 


he Clarion-Clipperton Zone, a deep-sea region in the eastern Pacific Ocean that ¢ 
is twice the size of Argentina and is threatened by commercial mining, is 
home to more than 5000 benthic species, but only 436 have been fully described 
and named. Scientists reported the tally in Current Biology last week after ana- 
lyzing 100,000 records of specimens collected during research cruises. Most ' 
of the unnamed species are crustaceans and marine worms. The zone is littered with 
naturally occurring polymetallic nodules, which contain nickel, cobalt, and other 
elements that are in high demand for electric vehicles. The International Seabed 
Authority has approved 17 mining exploration contracts in the region and expects to 
release regulations for deep-sea mining next month. 


This anemone 
lives in a deep-sea 
region rich in 
valuable metals. 
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Researchers have collected an unprecedented amount of peridotite, a kind of mantle rock, from below the sea floor. 


Ocean drillers exhume a bounty of mantle rocks 


Deep cores fulfill 60-year-old quest and could yield science bonanza 


By Paul Voosen 


n 1961, geologists off the Pacific coast 

of Mexico embarked on a daring jour- 

ney to a foreign land—the planet’s in- 

terior. From a ship, they aimed to drill 

through the thin veneer of Earth’s crust 

and grab a sample of the mantle, the 
2900-kilometer-thick layer of dense rock 
that fuels volcanic eruptions and makes up 
most of the planet’s mass. The drill only got 
a couple hundred meters below the seabed 
before the project foundered under spiral- 
ing costs. But the quest—one of geology’s 
holy grails—remained. 

Researchers onboard the JOIDES Reso- 
lution, the flagship of the International 
Ocean Discovery Program (IODP), said last 
month that they have finally succeeded. 
Drilling below the seabed in the mid- 
Atlantic Ocean, they have collected a core of 
rock more than 1 kilometer long, consisting 
largely of peridotite, a kind of upper mantle 
rock. Although it’s not clear how pristine 
and unaltered the samples are, it is certain 
the cylinders of gray-green rock present an 
unparalleled new record, says Susan Lang, 
a biogeochemist at the Woods Hole Oceano- 
graphic Institution and a co-lead of the 
cruise. “These are the types of rock we’ve 
been hoping to recover for a long time.” 

Researchers on land are eagerly fol- 
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lowing the ship’s daily scientific logs as 
it continues to drill, says Jessica Warren, 
a mantle geochemist at the University of 
Delaware. “Getting down to this really 
fresh stuff has been a dream for decades 
and decades,” she says. “We're finally going 
to see the Wizard of Oz.” 

The samples can help answer a host of 
questions, says Johan Lissenberg, an ig- 
neous petrologist from Cardiff University 
onboard the ship. They can provide direct 
evidence for how ocean crust differs in com- 
position from the upper mantle and bet- 
ter estimates of elemental abundances in 
the planet’s primary reservoir of rock. The 
samples of mantle will also help research- 
ers understand how magma melts out of the 
mantle and rises through the crust to drive 
volcanism, Lissenberg says. “This could be 
a whole step forward for understanding 
magmatism—and the global composition of 
the bulk Earth.” 

The 1961 project, called Project Mohole, 
was the first of a handful of unsuccessful at- 
tempts to reach the mantle. It was named 
after the Mohorovicié discontinuity, or 
“Moho,” a geophysical boundary defined 
by a sudden spike in the speed of seismic 
waves where the crust, a mélange of rocks 
crystallized out of mantle melt and altered 
by water, gives way to the more homo- 
geneous mantle. The Moho lies some 


35 kilometers below thick continental crust. 
But it is only about 7 kilometers below 
ocean crust. And it is shallower still at the 
drilling site of the JOIDES Resolution at the 
Mid-Atlantic Ridge, where the North Amer- 
ican and Eurasian tectonic plates are being 
stretched apart, forcing the mantle upward. 

Recovering along mantle core was not the 
primary goal of the cruise, which is probing 
the Atlantis Massif, an underwater moun- 
tain, for clues to the origin of life. The massif 
rocks contain lots of olivine, a mineral that 
reacts with water in a process called serpen- 
tinization. The reactions generate hydrogen, 


r 


which serves as an energy source for micro- . 


bial life at the “Lost City,” a nearby complex 
of ocean-bottom mineral chimneys depos- 
ited by gushers of superheated water. 

It’s long been theorized that life could 
have originated in such settings, which are 
rich in organic molecules. The cruise aimed 
to deepen a previously drilled 1.4-kilometer- 
deep hole, pushing to a depth too hot for 
life, where organic compounds that might 
have provided the raw material for the ear- 
liest life might lurk. But progress was slow. 

So the ship returned to another site near 
Lost City, where shallow cores drilled in 
2015 had found what appeared to be man- 
tle rocks highly altered by seawater. After 
punching through a horizontal fault near 
the seabed, “the drilling just went so magi- 
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cally well,” says Andrew McCaig, a geologist 
at the University of Leeds and the cruise’s 
other chief scientist. The only hiccup came 
when the recovered peridotite rocks con- 
tained veins of asbestos, prompting in- 
creased safety protocols. 

There's still some room for debate about 
whether the rocks are a true sample of the 
mantle, says Donna Blackman, a geophysicist 
at the University of California, Santa Cruz. 
The seismic speedup at the Moho is thought 
to reflect the lack of water or calcium and 
aluminum minerals in mantle rocks. Be- 
cause the samples still show some influence 
of seawater, Blackman says she might clas- 
sify them as deep crust. “But the petrology 
is interesting and special regardless,” she 
says. And as the team continues drilling 
into deeper rocks, Lissenberg says, “They’re 
getting fresher.” 

Indeed, it appears the team is already 
sampling mantle rock that has never 
melted into magma, which then cools and 
crystallizes into different kinds of crustal 
rocks, says Vincent Salters, a geochemist at 
Florida State University. By capturing the 
source rock, he says, researchers should be 
able to learn how magma melts, flows, and 
separates—clues to the workings of volca- 
noes worldwide. 

The rocks could also answer other ba- 
sic questions, such as how much the lavas 
collected at midocean ridges—which are 
often taken as a stand-in for the mantle— 
differ from the mantle itself, says James 
Day, a geochemist at the Scripps Institution 
of Oceanography. The abundance of radio- 
active elements in the rocks could improve 
estimates of how much heat the mantle 
produces as a whole, driving the deep con- 
vective motions that are the engine of plate 
tectonics. And their physical strength can 
inform studies of how earthquakes frac- 
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Drilling was conducted aboard the JOIDES Resolution, a U.S. ship slated to be retired next year. 


ture and propagate in the upper mantle. 
The cores could also help clarify how well 
the mantle is mixed, reincorporating in- 
gredients from the continental crust that 
is drawn back into Earth’s interior at deep 
ocean trenches. “There’s so much more to 
this than understanding a little piece of 
ocean floor,’ Day says. 

Research on the rocks has already begun 
in labs onboard the JOIJDES Resolution, 
and eventually the cores will be available 
at IODP repositories for all. But for all the 
excitement over the rock samples, the mo- 
ment is bittersweet: The expedition may 
be one of the last for the ship. In March, 
the National Science Foundation (NSF) an- 
nounced that, because of cost increases and 
a lack of a deal with its international col- 
laborators, it will end its operating contract 
for the ship in September 2024. 

The ship is in great condition and 
could continue until 2028, says Anthony 
Koppers, an associate vice president at Or- 
egon State University and a leader in the 
IODP community. There’s still a slim pos- 
sibility that the U.S. Congress will fund an 
extension, he says. But NSF has no plan yet 
to develop a successor ship. And the other 
two big contributors to IODP, Europe and 
Japan, are moving on. This month, they 
announced the creation of IODP?, a new 
global drilling program that will make 
heavy use of Japan’s drill ship, the D/V 
Chikyu, which in the past has operated 
mostly in waters near Japan. 

This was Lang’s first cruise on the JOJDES 
Resolution, and she was astonished by the 
capabilities of its labs and the knowledge of 
its technical staff. The success they’re hav- 
ing testifies to their decades of experience 
probing beneath the ocean floor, she says. 
“It’s so unfortunate that something like this 
is going to be lost.” & 


ned 
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U.S. debt 
deal clouds 
funding hopes 


Civilian programs take 
a back seat to defense in 
averting default 


By Jeffrey Mervis 


n agreement struck last weekend 

between President Joe Biden and 

House Speaker Kevin McCarthy (R- - 

CA) to avoid a U.S. government de- 

fault has reassured jittery financial 

markets. But its formula for holding 
federal spending flat for 2 years means sci- 
ence agencies will have to compete against all 
other civilian programs to win any increases : 
from Congress. 

Such a zero-sum game would mark a re- 
turn to the rules under which Congress op- 
erated for a decade ending in 2021, which 
limited but did not halt growth in research ; 
spending. Some research advocates predict it 
will be hard to win any sizable increases for 
science given everything else the government 
must fund. 

“T think we're looking at a status quo bud- 
get for FY [fiscal year] 2024,” says Matthew 
Hourihan of the Federation of American Sci- 
entists. “And when you factor in inflation, 
that means a real cut for most programs.” 

The 27 May agreement would allow the ; 
U.S. government to continue to borrow 
money for its operations after 5 June, when it 
is expected to reach the current debt ceiling 
of $31.4 trillion. The deal strikes a compro- 
mise between Republican demands for deep, 
sustained cuts in federal spending in return 
for raising the ceiling and Biden’s effort to 
protect federal programs. It would essentially 
hold the pot of money that funds all non- 
defense discretionary spending at its current 
level of $638 billion in FY 2024, which begins 
1 October, rather than the 7% increase Biden 
has requested. Defense spending would 
match his request by growing 3%. 

The 2024 number for civilian programs, 
although flat, allows for some new spend- 
ing by including tens of billions of dollars 
appropriated this year but not yet used. The 
unspent funds include money to beef up tax 
collections and pay for COVID-19 pandemic 
relief. But Biden was able to protect $5 bil- 
lion allocated for Project Next Gen, which 
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cally well,” says Andrew McCaig, a geologist 
at the University of Leeds and the cruise’s 
other chief scientist. The only hiccup came 
when the recovered peridotite rocks con- 
tained veins of asbestos, prompting in- 
creased safety protocols. 

There's still some room for debate about 
whether the rocks are a true sample of the 
mantle, says Donna Blackman, a geophysicist 
at the University of California, Santa Cruz. 
The seismic speedup at the Moho is thought 
to reflect the lack of water or calcium and 
aluminum minerals in mantle rocks. Be- 
cause the samples still show some influence 
of seawater, Blackman says she might clas- 
sify them as deep crust. “But the petrology 
is interesting and special regardless,” she 
says. And as the team continues drilling 
into deeper rocks, Lissenberg says, “They’re 
getting fresher.” 

Indeed, it appears the team is already 
sampling mantle rock that has never 
melted into magma, which then cools and 
crystallizes into different kinds of crustal 
rocks, says Vincent Salters, a geochemist at 
Florida State University. By capturing the 
source rock, he says, researchers should be 
able to learn how magma melts, flows, and 
separates—clues to the workings of volca- 
noes worldwide. 

The rocks could also answer other ba- 
sic questions, such as how much the lavas 
collected at midocean ridges—which are 
often taken as a stand-in for the mantle— 
differ from the mantle itself, says James 
Day, a geochemist at the Scripps Institution 
of Oceanography. The abundance of radio- 
active elements in the rocks could improve 
estimates of how much heat the mantle 
produces as a whole, driving the deep con- 
vective motions that are the engine of plate 
tectonics. And their physical strength can 
inform studies of how earthquakes frac- 
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ture and propagate in the upper mantle. 
The cores could also help clarify how well 
the mantle is mixed, reincorporating in- 
gredients from the continental crust that 
is drawn back into Earth’s interior at deep 
ocean trenches. “There’s so much more to 
this than understanding a little piece of 
ocean floor,’ Day says. 

Research on the rocks has already begun 
in labs onboard the JOIJDES Resolution, 
and eventually the cores will be available 
at IODP repositories for all. But for all the 
excitement over the rock samples, the mo- 
ment is bittersweet: The expedition may 
be one of the last for the ship. In March, 
the National Science Foundation (NSF) an- 
nounced that, because of cost increases and 
a lack of a deal with its international col- 
laborators, it will end its operating contract 
for the ship in September 2024. 

The ship is in great condition and 
could continue until 2028, says Anthony 
Koppers, an associate vice president at Or- 
egon State University and a leader in the 
IODP community. There’s still a slim pos- 
sibility that the U.S. Congress will fund an 
extension, he says. But NSF has no plan yet 
to develop a successor ship. And the other 
two big contributors to IODP, Europe and 
Japan, are moving on. This month, they 
announced the creation of IODP?, a new 
global drilling program that will make 
heavy use of Japan’s drill ship, the D/V 
Chikyu, which in the past has operated 
mostly in waters near Japan. 

This was Lang’s first cruise on the JOJDES 
Resolution, and she was astonished by the 
capabilities of its labs and the knowledge of 
its technical staff. The success they’re hav- 
ing testifies to their decades of experience 
probing beneath the ocean floor, she says. 
“It’s so unfortunate that something like this 
is going to be lost.” & 
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U.S. debt 
deal clouds 
funding hopes 


Civilian programs take 
a back seat to defense in 
averting default 


By Jeffrey Mervis 


n agreement struck last weekend 

between President Joe Biden and 

House Speaker Kevin McCarthy (R- 

CA) to avoid a U.S. government de- 

fault has reassured jittery financial 

markets. But its formula for holding 
federal spending flat for 2 years means sci- 
ence agencies will have to compete against all 
other civilian programs to win any increases 
from Congress. 

Such a zero-sum game would mark a re- 
turn to the rules under which Congress op- 
erated for a decade ending in 2021, which 
limited but did not halt growth in research 
spending. Some research advocates predict it 
will be hard to win any sizable increases for 
science given everything else the government 
must fund. 

“T think we're looking at a status quo bud- 
get for FY [fiscal year] 2024,” says Matthew 
Hourihan of the Federation of American Sci- 
entists. “And when you factor in inflation, 
that means a real cut for most programs.” 

The 27 May agreement would allow the 
U.S. government to continue to borrow 
money for its operations after 5 June, when it 
is expected to reach the current debt ceiling 
of $31.4 trillion. The deal strikes a compro- 
mise between Republican demands for deep, 
sustained cuts in federal spending in return 
for raising the ceiling and Biden’s effort to 
protect federal programs. It would essentially 
hold the pot of money that funds all non- 
defense discretionary spending at its current 
level of $638 billion in FY 2024, which begins 
1 October, rather than the 7% increase Biden 
has requested. Defense spending would 
match his request by growing 3%. 

The 2024 number for civilian programs, 
although flat, allows for some new spend- 
ing by including tens of billions of dollars 
appropriated this year but not yet used. The 
unspent funds include money to beef up tax 
collections and pay for COVID-19 pandemic 
relief. But Biden was able to protect $5 bil- 
lion allocated for Project Next Gen, which 
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will develop improved coronavirus vaccines 
and drugs. 

The negotiators are presenting the agree- 
ment as a victory, but hard-liners in both 
parties are dismayed by their side’s conces- 
sions. If approved by the House of Repre- 
sentatives and the Senate, the 99-page bill 
would delay imposing any new ceiling until 
January 2025, taking default off the political 
agenda until after the presidential election 
in November 2024. 

For scientists, the real drama will occur 
this year, as Congress works out the de- 
tails of government spending in FY 2024. 
Those negotiations will pit Biden’s request 
for healthy increases at several federal re- 
search agencies against a push by Repub- 
licans, who control the House but not the 
Senate, to reverse 2 years of sizable growth 
in federal research budgets after the previ- 
ous budget cap was lifted. 

“It’s in the hands of the appropriators 
now,” says Jennifer Zeitzer of the Federa- 
tion of American Societies for Experimental 
Biology, referring to the members sitting on 
the committee that writes spending bills for 
every federal agency. “Flat funding makes it 
more challenging, but it’s too soon to say how 
much more.” 

Recent increases for research were part 
of the trillions of dollars in new government 
spending in laws passed by Congress since 
Biden took office in January 2021. The leg- 
islation includes landmark measures to re- 
build the nation’s infrastructure, bolster the 
U.S. semiconductor industry, and combat 
climate change. 

This week’s agreement would halt that 
rising tide. To boost research spending even 
modestly in FY 2024, Congress would have 
to reallocate some of what’s been targeted 
for thousands of other discretionary, civilian 
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President Joe Biden (right) and House Speaker Kevin McCarthy (R-CA) negotiated a deal on the debt ceiling that could squeeze science spending. 


programs across the government. (More than 
half of all federal spending goes to mandatory 
payouts such as Social Security, Medicare, 
and interest on federal borrowing, accounts 
that fall outside the annual allocation.) 

Civilian and military spending would 
continue to be constrained in FY 2025, 
rising by only 1% over 2024 levels. (Re- 
publicans had initially sought to impose 
a decade’s worth of tight caps before set- 
tling for 2 years.) The agreement also con- 
tains a clause requiring a 1% cut in overall 
discretionary spending if Congress misses 
its 1 October deadline to pass a full-year 
budget and temporarily freezes spending. 
That penalty would be removed, however, 
if Congress later passes a more detailed 
spending plan in either year. 

Science agencies with the most ambi- 
tious plans have the most to lose from the 
proposed 2-year spending restrictions. For 
example, Biden’s 2024 budget request to 
Congress, submitted in March, includes a 
19% increase, to $11.3 billion, for the Na- 
tional Science Foundation (NSF). 

A high priority is NSF’s new technology 
directorate, designed to translate basic 
research findings into new technologies 
and businesses. Congress itself had aimed 
even higher, adopting a 5-year spending 
blueprint for NSF in the 2022 CHIPS and 
Science Act to strengthen the U.S. semi- 
conductor industry and related fields that 
would have allowed NSF’s budget to reach 
$15.6 billion in 2024. 

“A flat 2024 budget leaves a 2-year, $7 bil- 
lion gap with CHIPS,” Hourihan notes. “Those 
levels were always aspirational, but under the 
new agreement were barely trying.” 

Biomedical researchers are also worried 
about the impact of a flat budget. They 
were hoping to do much better than the 2% 


increase ($920 million) Biden requested in 
2024 for the National Institutes of Health 
(NIH), half of which would go to the Na- 
tional Cancer Institute. 

“That was a terrible number, and we’ve 
been making the case this spring for a bigger 
increase,” Zeitzer says. Biden is also seeking 
$1 billion more for the new Advanced Re- 
search Projects Agency for Health, which 
this year received $1.5 billion. 

The Department of Energy’s science pro- 
grams are slated for big increases under the 
CHIPS act, with Biden’s 2024 request for an 
8% increase as a down payment. But energy 
lobbyists say winning that $680 million 
boost, which included a large hike for in- 
dustry partnerships to accelerate progress 
in fusion energy, will now be a stretch. 

Several of NASA’s science missions also 
need a big increase to stay on course. So a 
tight budget could trigger a political fight 
pitting the Biden administration’s priori- 
ties on climate missions against congres- 
sional support for planetary exploration. 
A flat NASA science budget could result in . 
delays to one or more missions. 

As Science went to press, Congress was 
expected to vote on the agreement with the 
House going first as early as 31 May. Legis- 
lators hoped to pass the measure in time to 
avert a default. 

Once the agreement is in place, science 
advocates say the time to press their case 
will come after Congress divvies up the 
total amount available for discretionary 
spending among the 12 appropriations 
subcommittees, a step that could happen 
before the 4 July recess. 

“Once we see those numbers, we’ll have 
a much clearer picture of what FY 2024 will 
look like,” Zeitzer says. “So I think it’s still 
possible for NIH to do much better.” 
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NIH cracks down on clinical trials reporting 


Agency says it has brought more than 200 investigators into compliance since July 2022 


By Meredith Wadman 


ast year, the U.S. National Institutes of 

Health (NIH) delivered a stern warn- 

ing to two in-house clinical research- 

ers who had broken an important rule. 

They had failed to submit the results 

of two clinical trials they had overseen 
to ClinicalTrials.gov, a database meant to 
inform the public about human studies and 
their results. The reporting requirement 
has often been ignored, but this time the 
agency took an unprecedented step: It told 
the scientists it wouldn’t approve any more 
of their research until they fell in line. 

After that warning and other agency 
actions, the pair complied, well after the 
1-year deadline. 

The episode, described in a Government 
Accountability Office (GAO) report pub- 
lished in April, adds to other, systematic 
changes NIH has recently undertaken to en- 
sure that the more than $6 billion in clini- 
cal trials it funds annually, along with their 
results, are visible to scientists, physicians, 
patients, and ultimately taxpayers. Trans- 
parency advocates say the tougher stance 
is beginning to pay off. For example, GAO 
reported that between July and November 
2022, the agency brought 235 extramural 
researchers into compliance with registra- 
tion and reporting requirements. 

“We really do like some of the changes 
that the NIH has made. We think that that’s 
a really great start,’ says Navya Dasari, a 
lawyer who until recently headed efforts 
by the nonprofit activist group Universi- 
ties Allied for Essential Medicines to in- 
crease transparency of clinical trial results. 
Candice Wright, lead author of the GAO re- 
port, says NIH “should be ensuring compli- 
ance [with the policy]. It exists for a reason.” 

Under a 2007 law, sponsors running 
many clinical trials of drugs and devices— 
including those funded by NIH—are re- 
quired to register them on ClinicalTrials. 
gov within 21 days of enrolling the first vol- 
unteer. The results generally must be sub- 
mitted to ClinicalTrials.gov within 1 year of 
when key data are collected on the last par- 
ticipant. The law directs NIH to shut down 
funding to any institution whose research- 
ers are not up to date. 

But NIH has done little to enforce the re- 
quirements, even after it put in place a new 
policy in 2017 that expanded them to cover 
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all NIH-funded trials and media reports be- 
gan to throw a spotlight on problems. 

As recently as August 2022, the U.S. De- 
partment of Health and Human Services’s 
Office of the Inspector General found that 
just 35 of 72 NIH-funded clinical trials due 
to report their results in 2019 and 2020 
had done so in a timely manner—and that 
25 had not submitted them at all. 

NIH has recently taken steps to bring 
those numbers up. They include having 
both the funding institute and the Office of 
Extramural Research contact tardy investi- 
gators to bring them into compliance. And 
GAO noted that extramural investigators 


do more. The GAO report also found that 
16% to 18% of trials are registered late— 
a number that did not budge from 2019 
through 2022. (The numbers are worse for 
pediatric trials, a recent study reported.) 
The tardy performances included NIH’s 
own institutes, led by the National Cancer 
Institute, where 81 trials were registered 
late in that period. 

Deborah Zarin, who directed Clinical- 
Trials.gov from 2005 to 2018, argues that 
trial registration and results reporting is as 
important as getting a research volunteer’s . 
informed consent to participate in a study. 
“What if I told you that 18% of trials had not 


————— 
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Many investigators at NIH’s Clinical Center have been slow to post their trial results in a federal database. 


are now required to show NIH proof of trial 
registration and results reporting before fil- 
ing the annual progress reports necessary 
to receive their grant’s next year of funding. 

Michael Lauer, NIH’s extramural re- 
search chief, credited the agency’s changes 
when he gave updated numbers for 530 ex- 
tramural trials required to report results in 
2020, 2021, and 2022. In a March blog post, 
he reported that fully 96% of these trials 
had reported results to ClinicalTrials.gov. 
Only 37% had met the 1-year deadline, how- 
ever, and in 2022 the median for tardiness 
was 400 days. 

“Clearly, we still need to improve, and 
we are committed to taking this challenge 
head on,” Lauer wrote on the blog. “Moving 
forward, you will see increased communi- 
cation from us and, if needed, enforcement 
actions to get us to where we need to be.” 

NIH’s critics say the agency still needs to 


obtained informed consent? You’d probably 
be appalled,” says Zarin, who is now at Har- 
vard University and Brigham and Women’s 
Hospital. She and others note that the in- 
formation is needed for many reasons, from 
making sure two research groups don’t re- 
peat the same trial to revealing failed trials 
that often aren’t published so others can 
steer away from those approaches. 

Till Bruckner, a policy analyst who 
founded TranspariMED, a campaign aimed 
at ending evidence distortion in medicine, 
calls NIH’s recent actions “an improvement.” 

But Bruckner thinks NIH should pull 
funding from entire institutions that have a 
track record of poor compliance with the re- 
quirements. “If NIH would just once crack 
down properly on institutions, not only on 
individuals, that would send such a strong 
signal that going forward, 95% of the prob- 
lem would be solved.” & 
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Researchers have used data that track the use of pesticides and other farm chemicals at the county level in a wide array of health and environmental studies. 
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scientists protest changes to U.S. pesticide data 


Move to reduce scope and frequency of U.S. Geological Survey database sparks concern 


By Virginia Gewin 


ast year, Alan Kolok, an ecotoxico- 

logist at the University of Idaho, 

published a study that found the in- 

cidence of cancer in counties across 

11 western U.S. states was correlated 

with the use of farm chemicals called 
fumigants, which kill soil pests. The fine- 
grained analysis was feasible, he says, be- 
cause a U.S. government database made 
timely, county-level statistics on pesticide 
use publicly available. 

Now, Kolok is one of many scientists con- 
cerned that changes to the National Pesti- 
cide Use Maps database will make it far less 
useful to scientists. Last month, he joined 
more than 250 researchers and dozens of 
public health and environmental groups in 
urging the U.S. Geological Survey (USGS), 
which oversees the database, to reconsider 
moves to reduce the number of chemicals it 
tracks and to release updates less frequently. 
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The agency says the changes are being 
driven, in part, by budget constraints and a 
desire to align the pesticide survey with its 
other research programs. But in an open 
letter to USGS, critics say the changes en- 
danger a database that provides “vital in- 
formation and tracks trends that are not 
available anywhere else.” 

The USGS data have played a role in more 
than 500 peer-reviewed studies, the letter 
notes, including highly cited works on the 
impact of pesticides on public health, water 
quality, and ecosystems. Instead of reduc- 
ing the database’s scope and frequency, the 
critics say USGS should be expanding it in 
order better track the estimated 540 million 
kilograms of pesticides used annually in the 
United States. “We need credible sources 
of data to be able to study and understand 
what this widespread pesticide use means 
to the health of people and the environ- 
ment,” the letter states. 

At its height, the USGS database, which 


dates to 1992, tracked the shifting use of ; 
more than 400 chemicals to control in- 
sects, fungi, weeds, and other pests. Each 
year, the agency typically released pre- 
liminary maps documenting pesticide use 
2 years prior. To make the maps, agency . 
staff combined farm data on pesticide 
use on specific crops—purchased from 
Kynetec, a company based in the United 
Kingdom—with crop acreage data from the 
U.S. Department of Agriculture. 

In recent years, however, USGS has nar- 
rowed its approach. The most recent data 
release, which covered 2018 and 2019, 
included only 72 compounds that USGS 
judged to be especially important because 
of their widespread use and toxicity. In a 
statement, the agency said the shorter list 
aligns the survey with “the list of pesti- 
cides that USGS routinely collects data on 
for water quality purposes.” 

On 25 May, the agency said there are no 
immediate plans to expand the list. It also 
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said that, from now on, it would not release 
the preliminary data every year. Instead, 
USGS expects to release its next full report, 
covering 2018 to 2022, in late 2024; reports 
will be published every 5 years starting in 
2029. The schedule change could save the 
agency roughly $100,000 each year. 

Many scientists aren’t happy with those 
decisions. “This plan to just keep the pro- 
gram running on life support does not 
reflect how important it is,’ says Nathan 
Donley, a senior scientist at the nonprofit 
Center for Biological Diversity. Having to 
wait 5 years for data, he argues, will make 
it impossible for researchers to detect 
trends and potential problems early and 
address them quickly. The data are “basi- 
cally just a history lesson at that point,’ he 
says. “What’s the point ... if you're going 
make it harder for the public to use the 
data in any meaningful way?” 

Others say the agency should be track- 
ing more pesticides, not fewer. “There 
are literally hundreds of active ingre- 
dients and thousands of products that 
are applied on croplands,” notes Christy 
Morrissey, an ecotoxicologist at the Univer- 
sity of Saskatchewan who studies pesticide 
impacts on birds and insects. Research- 
ers say USGS should not only restore its 
original tracking list—which included 
antibiotics such as oxytetracycline and 
streptomycin—but also add any new farm 
chemicals approved by the Environmental 
Protection Agency (EPA). “The most wide- 
spread pollutants today aren’t necessarily 
going to be the most widespread in 5 or 
10 years,” says Donley, who notes that EPA 
approves about five new products each year. 

Some scientists also want USGS to re- 
start efforts to track one of the fastest 
growing uses of pesticides: seed coatings 
that protect against, for example, plant 
diseases or nematodes. Kynetec stopped 
tracking chemicals used to coat seeds in 
2014 because surveys were deemed too 
complicated to conduct accurately. One 
result is that researchers are now unable 
to track the full extent of neonicotinoids, 
controversial chemicals that have been 
linked to dwindling bee populations. (In 
January, researchers published a paper in 
the Proceedings of the National Academy 
of Sciences that relied on USGS data from 
2008 to 2014, when it still included coated 
seeds. The study concluded that neonicoti- 
noids had harmed populations of the west- 
ern bumble bee.) 

As Science went to press, neither USGS 
nor its parent agency, the Department of the 
Interior, had formally responded to the sci- 
entists’ pleas. 


Virginia Gewin is a journalist in Portland, Oregon. 
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Primate genomes offer new view 
of human health and our past 


Sequencing efforts may also aid primate conservation 


By Elizabeth Pennisi 


umans have long seen themselves 
mirrored in other primates, with 

apes’ social behavior and cogni- 

tive abilities shedding light on our 

own. Now, two international teams 

have stared deeper into the mirror. 

By sequencing the genomes of more than 
200 nonhuman primates, from palm-size 
mouse lemurs to 200-kilogram gorillas, they 
have come up with clues to human health 
and disease, and to the origin of our species. 
The genomes and their analyses, reported 
this week in Science and Science Advances, 
representa massive effort involving morethan 
100 researchers from about 
20 countries who braved lo- 
gistical challenges and _ bu- 


“This massive 


Farh thought he could find more clar- 
ity by searching for analogous variants 
in other primate species. “We recognized 
that data from our own species was in- 
sufficient.” After testing the idea with the 
primate genomes available several years 
ago, in 2019 he reached out to evolutionary 
geneticist Tomas Marques-Bonet from the 
Institute of Evolutionary Biology in Barce- 
lona, Spain, and primate geneticist Jeffrey 
Rogers at Baylor College of Medicine with a 
proposal. If they could come up with blood 
samples from multiple members of many 
of the world’s 500-plus primates, lumina 
would help fund the DNA sequencing. 

The ambition was staggering, say 
some scientists outside 
the project. “It takes an 
enormous amount of 


reaucratic gauntlets to collect sample will time, effort, and govern- 
blood samples from some e ment permits to obtain ge- 
800 wild and captive pri- ultimately netic samples of wild pri 
mates. The resulting data mates,” says Paul Garber, a 
show how knowing a pri- spark new and biological anthropologist 


mate’s genetic diversity could 

improve the odds of saving 

highly endangered species. 
But our own species could 


unexpected 
research directly 


emeritus at the University 
of Illinois Urbana-Cham- 
paign. And it’s even more 
difficult for species classi- 


also benefit. One team used relevant to fied as threatened—which 
the genomes to train a ma- laine more than 60% of nonhu- 
chine learning tool _ that human Origins. man primates are. 

could assess whether human Luis Darcy Verde Undaunted, Marques- 
genetic variants are likely Arregoitia, Bonet signed up research- 


to cause disease. And both 
explored the complexity of 
primates’ evolution, shedding light on our 
own. “This massive sample will ultimately 
spark new and unexpected research directly 
relevant to human origins,” says Luis Darcy 
Verde Arregoitia, a mammalogist at the 
Mexico Institute of Ecology who was not in- 
volved with either group. 

The bigger of the two genome efforts was 
spearheaded not by a primatologist or evolu- 
tionary biologist, but a clinical geneticist at 
the DNA-sequencing company Illumina. For 
Kyle Farh, like many in medicine, the genom- 
ics revolution has been a source of frustra- 
tion as well as hope. Human gene sequencing 
has turned up myriad variants of individual 
genes that might explain diseases or treat- 
ments. But human genetics alone often can’t 
tell whether a variant is medically relevant. 


Mexico Institute of Ecology 


ers around the world. “It 
was an amazing opportu- 


T 


nity to expand the scope of my research in- | 


terests,” recalls ecologist Jean Boubli, who 
grew up and worked in Brazil before set- 
ting up a U.K. lab at the University of Sal- 
ford. He contributed samples for 77 South 
American species, most obtained during 
his 30 years of exploring and living in the 
Amazon, collaborating with local scien- 
tists, museums, and zoos. 

Getting blood samples from anesthe- 
tized or restrained wild primates in zoos 
or captive breeding centers was often 
challenging, says another contributor, 
Govindhaswamy Umapathy. A conserva- 
tion biologist at the Centre for Cellular 
and Molecular Biology, Umapathy traveled 
from state to state in India to lobby forest 
managers and local officials for access to 
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said that, from now on, it would not release 
the preliminary data every year. Instead, 
USGS expects to release its next full report, 
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2029. The schedule change could save the 
agency roughly $100,000 each year. 

Many scientists aren’t happy with those 
decisions. “This plan to just keep the pro- 
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reflect how important it is,’ says Nathan 
Donley, a senior scientist at the nonprofit 
Center for Biological Diversity. Having to 
wait 5 years for data, he argues, will make 
it impossible for researchers to detect 
trends and potential problems early and 
address them quickly. The data are “basi- 
cally just a history lesson at that point,’ he 
says. “What’s the point ... if you're going 
make it harder for the public to use the 
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Others say the agency should be track- 
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are literally hundreds of active ingre- 
dients and thousands of products that 
are applied on croplands,” notes Christy 
Morrissey, an ecotoxicologist at the Univer- 
sity of Saskatchewan who studies pesticide 
impacts on birds and insects. Research- 
ers say USGS should not only restore its 
original tracking list—which included 
antibiotics such as oxytetracycline and 
streptomycin—but also add any new farm 
chemicals approved by the Environmental 
Protection Agency (EPA). “The most wide- 
spread pollutants today aren’t necessarily 
going to be the most widespread in 5 or 
10 years,” says Donley, who notes that EPA 
approves about five new products each year. 

Some scientists also want USGS to re- 
start efforts to track one of the fastest 
growing uses of pesticides: seed coatings 
that protect against, for example, plant 
diseases or nematodes. Kynetec stopped 
tracking chemicals used to coat seeds in 
2014 because surveys were deemed too 
complicated to conduct accurately. One 
result is that researchers are now unable 
to track the full extent of neonicotinoids, 
controversial chemicals that have been 
linked to dwindling bee populations. (In 
January, researchers published a paper in 
the Proceedings of the National Academy 
of Sciences that relied on USGS data from 
2008 to 2014, when it still included coated 
seeds. The study concluded that neonicoti- 
noids had harmed populations of the west- 
ern bumble bee.) 

As Science went to press, neither USGS 
nor its parent agency, the Department of the 
Interior, had formally responded to the sci- 
entists’ pleas. 
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primates’ evolution, shedding light on our 
own. “This massive sample will ultimately 
spark new and unexpected research directly 
relevant to human origins,” says Luis Darcy 
Verde Arregoitia, a mammalogist at the 
Mexico Institute of Ecology who was not in- 
volved with either group. 

The bigger of the two genome efforts was 
spearheaded not by a primatologist or evolu- 
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the DNA-sequencing company Illumina. For 
Kyle Farh, like many in medicine, the genom- 
ics revolution has been a source of frustra- 
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has turned up myriad variants of individual 
genes that might explain diseases or treat- 
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his 30 years of exploring and living in the 
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Getting blood samples from anesthe- 
tized or restrained wild primates in zoos 
or captive breeding centers was often 
challenging, says another contributor, 
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managers and local officials for access to 


2 JUNE 2023 « VOL 380 ISSUE 6648 881 


NEWS | IN DEPTH 


gibbons, lorises, macaques, and lemurs. 

Led by Marques-Bonet’s postdoc Lukas 
Kuderna, now at Illumina, the consortium se- 
quenced 703 individuals of 211 species using 
“short-read” technology in which DNA is first 
broken into small bits. The new data joined 
106 already sequenced genomes from 29 
additional primate species and a set of new 
genomes for 27 other primate species. Those 
genomes came from the second consortium, 
co-led by Dong-Dong Wu, a geneticist at the 
Chinese Academy of Sciences’s Kunming In- 
stitute of Zoology, which used a technique 
that read longer stretches of DNA. 

With their data and the other 
primate genomes, Wu and his 
colleagues honed the family tree 
for this group of mammals and 
identified unexpected genomic 
rearrangements—duplicated or 
inverted regions of chromosomes, 
for example—that distinguished 
primates living in different envi- 
ronments, such as tropical rain- 
forest and semidesert. Further 
study may reveal whether the 
shuffling helped those species 
adapt to the various conditions. 

The trove of primate genomes 
allowed Farh, Rogers, Marques- 
Bonet, and colleagues to go 
hunting for single nucleotide 
polymorphisms (SNPs), individ- 
ual DNA base variations within 
or between species that may 
change the proteins encoded 
by genes or alter a gene’s activ- 
ity. They found 4.3 million that 
altered a protein’s amino acid 
sequence. “The initial presenta- 
tions took my breath away,” re- 
calls Amanda Melin, a biological 
anthropologist at the University 
of Calgary who provided samples 
of Costa Rican primates. “The 
scale of it was really staggering.” 

On the assumption that a hu- 
man SNP with commonly ob- 
served counterparts in primates 
probably doesn’t cause disease, Farh exon- 
erated many human variants. His team also 
used the “benign” primate SNPs to train a 
neural network, called Primate AI-3D. With 
AlphaFold, a protein-structure prediction 
tool based on artificial intelligence (AI), as 
its scaffold, his program builds 3D models 
of each protein. Based on the benign SNPs, 
it identifies regions where changes to the 
protein’s structure would not disrupt its 
function. Conversely, changes in other re- 
gions were more likely to cause problems. 

He then applied the AI to predict the po- 
tential harm of human SNPs. And when he 
and colleagues matched those predictions 
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with a database of human base changes 
that had been tentatively linked to dis- 
eases, they concluded 6% of the SNPs are 
likely innocent. “I was a bit skeptical” at 
first, says Kaitlin Samocha, a geneticist at 
Massachusetts General Hospital. But, “This 
resource is a great way to ‘rule out’ a vari- 
ant as being damaging and does move the 
needle on our ability to interpret protein- 
altering variation.” 

The team also used the primate-trained 
AI to do the opposite: Identify harmful 
genes. They applied it to the health records 


Tarsiers like this one were among 
hundreds of primates whose DNA was sequenced. 


and gene variant data of 454,712 people in 
the UK BioBank to find SNPs likely to play 
a role in 90 human health concerns. “It al- 
lows us to identify which genes are poten- 
tial drug targets,” Farh says. 

Neil Risch, a geneticist at the University 
of California, San Francisco, says other 
researchers will need to vet the AI predic- 
tions. But he does think these primate ge- 
nomes “are treasured samples.” 

Evolutionary biologists agree. Already 
the genomes have revealed an important 
role in evolution for hybridization, once 
thought to be rare. In one Science paper, 
Wu and his colleagues show that the criti- 


cally endangered gray snub-nosed monkey, 
which is endemic to mountains in south- 
central China, arose after the golden snub- 
nosed monkey mated with the ancestors 
of two other species in that genus, Rhino- 
pithecus. Moreover, one of the three groups 
of macaques arose through hybridization 
between the other two, about 3.5 million 
years ago, they report in Science Advances. 

The other consortium, led by Rog- 
ers, also found signs of rampant hybrid- 
ization in the DNA of 225 wild baboons 
from multiple species, which conservation 
biologist Julius Keyyu at the Tan- 
zania Wildlife Research Insti- 
tute helped obtain and analyze. 
“This work provides a potential 
analog to recent human evolu- 
tion,’ notes Eleanor Scerri, an 
evolutionary archaeologist at 
the Max Planck Institute of Geo 
anthropology. Increasing evidence 
shows that intermingling once oc- 
curred among various hominids— 
Neanderthals, modern humans, 
Denisovans, and maybe others— 
tens of thousands of years ago. 

The primates that are deliver- 
ing these insights are themselves 
under threat from habitat destruc- 
tion and other human activity. But 
a surprising finding from the stud- 
ies could aid efforts to save them. 
Normally a population crash in a 
species also narrows its genetic 
diversity, thanks to inbreeding 
among the survivors. Yet all but 
15 primate species sequenced by 
the team still had relatively high 
genetic diversity—higher than 
humans. That was true even in 
extremely endangered ones such 
as the northern sportive lemur 
(Lepilemur  septentrionalis) of 
which only 40 are known to exist, 
all within 12 square kilometers 
of Madagascar. 

This suggests the primates’ 
population crashes, some likely 
caused by human habitat destruction, were 
so recent that there hasn’t been time for in- 
breeding to lower the species’ diversity. “The 
population declines are so rapid that genet- 
ics does not manage to catch up with it,’ 
says Katerina Guschanski, an evolutionary 
biologist at the University of Edinburgh and 
Uppsala University. 

Umapathy and others say the finding 
is encouraging, because higher diversity 
should make species more resilient. As 
animal ecologist Fabiano Melo from Vicosa 
Federal University, who collaborates with 
Boubli, points out, “It means that we still 
have time to revert this situation.” 
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The U.S. Geological Survey is funding mapping of metamorphic rocks in eastern Alaska that are likely to hold a number of critical minerals, including rare earths. 


TREASURE HUNT 


The first U.S. nationwide geological survey in a generation could 
reveal badly needed supplies of critical minerals 


rom the air, Maine is a uniform sea 
of green: Forests cover 90% of the 
state. But beneath the foliage and 
the dirt lies an array of geological 
terrains that is far more diverse, 
built from the relics of volcanic 
islands that collided with North 
America hundreds of millions of 
years ago. 
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By Paul Voosen 


Two years ago, sensor-laden aircraft began 
to survey these geochemically rich terrains 
for precious minerals. Researchers spot- 
ted an anomalous signal streaming out of 
Pennington Mountain, 50 kilometers from 
the Canadian border. State geologists bush- 
whacked through the paper mill-bound 


pine forests, taking rock samples. They 
eventually uncovered deposits containing 
billions of dollars’ worth of zirconium, nio- 
bium, and other elements that are critical in 
electronics, defense, and renewable energy 
technologies. “It was a perfect discovery,” 
says John Slack, an emeritus scientist at the 
U.S. Geological Survey (USGS) who worked 
on the Maine find. He expects more like it. 
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Hunting high and low 


Armed with a $320 million boost from Congress, the U.S. Geological Survey is funding airborne and field campaigns to identify rocks likely to hold minerals critical 
for renewable energy and electronics, like lithium and rare earth elements. The campaign, called the Earth Mapping Resources Initiative (Earth MRI), is the first major 
assessment of the country’s mineral wealth in nearly half a century. It is deploying different techniques depending on the geology of each region. 
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Low-flying aircraft outfitted with 
magnetometers can survey iron- 
bearing rocks hidden in the shallow 
earth. Gamma ray spectrometers 
hunt for the radioactive signature of 
rare earth elements. 


Earth MRI is helping complete a 
high-resolution topographic map using 
airborne laser altimeters, or lidar. 
These data are essential for geological 
mapping and can reveal the surface 
expression of ancient landforms. 


In the arid West, where trees don’t 
block the view, flights using a 
NASA hyperspectral instrument 
will hunt for the signature of 
minerals in hundreds of channels 
of reflected light. 


The agency is sponsoring field 
mapping campaigns by state 
geologists. It is also funding 
broader geochemical surveys 
and studies of mineral resources 
left in old mine waste piles. 


“We think there’s potential throughout the 
Appalachians.” 

Few topics draw more bipartisan sup- 
port in Washington, D.C., than the need for 
the United States to find reliable sources of 
“critical minerals,” a collection of 50 mined 
substances that now come mostly from 
other countries, including some that are 
unfriendly or unstable. The list, created by 
USGS at the direction of Congress, contains 
not only the 17 rare earth elements produced 
mostly in China, but also less exotic materi- 
als such as zinc, used to produce steel, and 
cobalt, used in electric car batteries. “These 
commodities are necessary for everything,” 
says Sarah Ryker, USGS’s associate direc- 
tor for energy and minerals. “They’re also a 
flashpoint for conflict.” 

But last decade, when lawmakers began 
to ask USGS about U.S. supplies, the re- 
sponse was unsettling: The agency didn’t 
even know where to look. For decades, com- 
panies had been moving mining operations 
abroad, in part to avoid relatively stringent 
U.S. environmental regulations. The basic 
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exploration needed to identify mineral re- 
sources and spur corporate interest had lan- 
guished. The last nationwide survey, a quest 
for uranium, ended in the 1980s. Ryker says 
the U.S. is “undermapped” compared with 
most developed countries, including Aus- 
tralia, Canada, and even Ireland. “We're at 
an embarrassing point.” 

To start filling in this knowledge void, 
USGS in 2019 began what it calls the Earth 
Mapping Resources Initiative, or Earth MRI. 
With a modest $10 million annual budget, the 
agency began working with state geological 
surveys to digitize data and commission 
fieldwork to map the most promising terrain 
in fine detail. 

Then, in 2021, the Bipartisan Infrastruc- 
ture Law directed $320 million into the 
program—nearly one-third of the entire 
USGS budget—to be spent over 5 years. That 
spending has already enabled hundreds of 
survey flights, and it is opening a golden age 
for economic geology. It is also a boon for 
basic science—filling in gaps in geologic his- 
tory, identifying unknown earthquake faults, 


and revealing geothermal systems. “We’re 
seeing a renaissance throughout the whole 
country,” says Virginia McLemore, an eco- 
nomic geologist at the New Mexico Bureau 
of Geology and Mineral Resources. “I’ve been 
training all my life to get to this point.” 

The discoveries could spur a rash of min- 
ing, and environmentalists are wary. If . 
USGS spots promising ore systems, compa- 
nies will have to show that they can develop 
them safely and with minimal environmen- 
tal impact, says Melissa Barbanell, direc- 
tor of U.S.-international engagement at the 
World Resources Institute, an environmen- 
tal nonprofit. “It can never be zero harm,” 
she says. “But how can we minimize the 
harm and keep it to the mine itself?” 

Mining companies, meanwhile, are em- 
bracing Earth MRI. Donald Hicks, a geo- 
physicist at global mining giant Rio Tinto, 
which has dozens of operations worldwide 
but only a few in the U.S., says he has en- 
couraged fellow miners to collaborate and 
share data with the program. Rio Tinto even 
funded some USGS flights in Montana, in re- 
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turn for 1 year’s exclusive access to the data. 
“Having this high-quality, large-scale data in 
the public domain will drive new ideas and 
new discoveries,” Hicks says. 


FOR MOST OF THE HISTORY of mining, the ori- 
gin story of a mineral lode was beside the 
point. Prospectors found it and miners dug 
it up. But by now, most of the obvious finds 
are gone, says Anne McCafferty, a USGS 
geophysicist. “The low-hanging fruit has 
been picked.” 

This scarcity has pushed Earth MRI into 
adopting a “mineral systems” approach, 
first pioneered in Australia, that attempts 
to predict where critical minerals might 
be found based on the processes that form 
them. For example, a search for rare earth 
minerals might begin by looking for an un- 
usual kind of carbon-rich rock called a car- 
bonatite, which often contains pockets of 
rare earths formed when it crystallized out 
of lava. Or geologists might seek out clay- 
rich rocks or sediments that can capture 
concentrations of the rare earths after wa- 
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ter erodes them from a source rock. Pros- 
pectors would also look for signs that these 
ore rocks were preserved across the eons. 

To assemble these telltale rock histories, 
USGS scientists need to integrate a variety 
of information sources. Some already exist: 
large-scale geological maps based on de- 
cades of fieldwork, and surveys of the deep 
structure of rock formations based on the 
reflections of seismic waves from artificial or 
natural earthquakes. 

Earth MRI’s airborne surveys, with flights 
just 100 meters above the surface, will add 
much more detail and inform a new gen- 
eration of sharper geologic maps. One tool 
affixed to the aircraft is a magnetometer, 
which detects rocks rich in iron and other 
magnetic minerals—often a clue that they 
hold critical minerals. Another is a gamma 
ray spectrometer, which like a Geiger coun- 
ter can capture the radiation emitted by 
thorium, uranium, and potassium. Those 
elements frequent the same volcanic rocks 
as rare earth minerals and are often incor- 
porated into their crystal structures. Other 


In Nevada, a helicopter an an induction coil measures subsurface electrical chistes a 
(top left) and a researcher calibrates data collected by an airborne hyperspectral sensor, i ~ 
(right). In Maine, exolegicis carry sensors to chart rocks’ arlene (bottom left). 


aircraft carry laser altimeters that can map 
surface relief to reveal geologic history. And 
a pioneering “hyperspectral” instrument de- 
veloped by NASA can identify minerals ex- 
posed on the surface based on the specific 
wavelengths of light they absorb. In the 
combined data, “You can see all the geology 
underneath,” says Anjana Shah, the USGS . 
geophysicist leading the agency’s East Coast 
airborne surveys. “It’s a very powerful way of 
understanding the Earth.” 

In early forays, Earth MRI aircraft criss- 
crossed North and South Carolina, tracing 
the ancient roots of the landscape. Hidden 
beneath the states’ tobacco farms are fossil- 
ized beaches that mark shorelines left dur- 
ing the warm periods between past ice ages, 
when sea levels were higher than today. La- 
ser altimeter maps capturing subtle relief 
bloom with those shorelines and the paleo- 
rivers that dissected them, says Kathleen 
Farrell, a geomorphologist at the North Car- 
olina Geological Survey. “There’s a lot more 
coastal plain than anyone thought.” 

The ancient beaches hold deposits of 
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black sands, eroded from mountains and 
deposited by rivers, that are rich in heavy 
elements. By combining the new airborne 
data collected by Shah with field mapping 
and boreholes drilled to sample the deep 
sediments, Farrell and her colleagues hope 
to learn how the Carolina sands originated. 
They want to know how the coastal plains 
were assembled over time, why the heavy 
sands formed only during certain periods, 
and where upriver those sands came from. 
The answers should help guide geologists to 
new heavy metal deposits; similar sites in 
northern Florida are among the few com- 


remains poorly understood. 

The Reelfoot and nearby bedrock defor- 
mations not only create hazards; they also 
create opportunities for minerals to form. 
The rifts provided conduits for magma to 
well up much later in geologic time, when 
Africa collided with North America to form 
the Appalachian Mountains. This magma is 
thought to have expelled gases that flowed 
into limestones, chemically altering them. 
One result is the fluorspar district of south- 
ern Illinois, which once produced a majority 
of the country’s fluorite—used to smelt steel 
and create hydrofluoric acid. 


Magnetic anomalies (red) beneath southeastern Missouri reveal iron oxide deposits formed 1.4 billion years ago. 


mercial sources of titanium in the U.S. 

The airborne campaigns in South Caro- 
lina will have another benefit, Shah adds: 
They flew over Charleston, collecting mag- 
netic data that, by identifying shifts and off- 
sets in subsurface rocks, reveal the hidden 
seismic faults that ruptured in 1886 in an 
earthquake as large as magnitude 7. Such a 
quake, if it struck again today, would cause 
billions of dollars in damage. 

This year, an Earth MRI survey cover- 
ing parts of Missouri, Kentucky, Tennessee, 
Arkansas, Illinois, and Indiana will probe 
another mysterious seismic zone. Buried 
under kilometers of sediment lurks the 
Reelfoot Rift, a gash in the continent’s bed- 
rock likely created some 750 million years 
ago when the Rodinia supercontinent be- 
gan to crack apart. In 1811 and 1812, faults 
tied to this rift caused the New Madrid 
earthquakes, the largest to ever strike the 
U.S. east of the Rocky Mountains. But de- 
spite the potential hazard, the fault zone 
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Those magma injections could have played 
a role in creating Hicks Dome, which rises 
1 kilometer above the Illinois countryside 
and is the closest thing the state has to a 
volcano. Jared Freiburg, critical minerals 
chief for the Illinois State Geological Survey, 
calls it “a crazy magmatic cryptovolcanic ex- 
plosive structure.” It pops out as a magnetic 
anomaly in USGS airborne data, and cores 
drilled from the dome are rich in rare earth 
minerals. Geochemical tracers from the cores 
hint that deposits deeper in the dome were 
formed from carbonatites—the unusual vol- 
canic rocks associated with the world’s best 
rare earth deposits. “It’s like a kitchen sink of 
critical minerals there,’ McCafferty says. 

The midcontinent surveys could also help 
geologists assess another resource: natural 
hydrogen, a clean-burning fuel. Currently, 
all hydrogen is manufactured, but some re- 
searchers believe, contrary to conventional 
wisdom, that Earth produces and traps 
vast stores of the gas (Science, 17 February, 


p. 630). The iron-rich volcanic rocks of the 
Reelfoot are exactly the kind that could pro- 
duce hydrogen. Yaoguo Li, a geophysicist at 
the Colorado School of Mines, is developing 
a Department of Energy (DOE) grant pro- 
posal to prospect for hydrogen source rocks 
with the USGS data. “We have not done any- 
thing yet,’ he says. “But I can see there’s so 
much we can do.” 

Besides identifying resources to extract, 
the surveys could pay other dividends. 
They are pinpointing the steel casings of 
abandoned oil and gas wells that often leak 
greenhouse gases. They will help identify 
porous rock reservoirs, bounded by faults, 
that could hold carbon dioxide captured 
from smokestacks, keeping it out of the at- 
mosphere. And they could also map varia- 
tions in the radioactive rocks that emit 
radon gas, a health hazard. 


THESE DAYS, no mineral may be more criti- 
cal than the lithium, used in cellphone and 
electric car batteries, that moves an ever- 
increasing number of the world’s electrons. 
Yet only one lithium mine exists in the U.S., 
in Nevada, and its raw lithium is sent abroad 
for processing. The state has potential to hold 
much, much more, and could become an in- 
ternational lithium “epicenter,” says James 
Faulds, Nevada's state geologist. 

Lithium is often found in igneous rocks— 
magma that crystallized in the crust or lava 
that cooled on the surface. Many of the 
known lithium deposits are in the state’s 
north, in the McDermitt caldera, a volcanic 
crater formed 16 million years ago by the 
deep-Earth hot spot currently fueling Yellow- 
stone. Rainwater falling within the caldera 
or hot water from below has concentrated 
lithium within caldera clay deposits to levels 
not seen elsewhere, in other eruptions of the 
Yellowstone hot spot. “Why did this mineral- 
ization happen?” asks Carolina Mufioz-Saez, 
a geologist at the University of Nevada, Reno. 
She and her collaborators are studying the 
geochemistry of the lithium and the clays 
to find out whether the element was formed 
and concentrated during the eruption itself 
by superheated water or whether the concen- 
tration came later, as water infiltrated the cal- 
dera’s ash-rich rocks. The answer could lead 
the geologists to other, equally rich deposits. 

Earth MRI has already shown that lithium 
prospectors need not stick to calderas. Field 
geologists have found rocks that seem to be 
rich in lithium in basins bounded by tectoni- 
cally uplifted blocks of crust. Nevada, famous 
for its “basin and range” topography, has a lot 
of places like that, Faulds says. Even better, 
the basins tend to host systems of hot brine, 
a potential source of geothermal power—one 
reason DOE is funding surveys in the state, 
says Jonathan Glen, a USGS geophysicist. 
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Mountain Pass in California is the only U.S. mine producing rare earth elements. The U.S. Geological Survey hopes Earth MRI will encourage more mining. 


Just south of Nevada, DOE has similarly 
invested in USGS flights over California’s 
Salton Sea, which is being stretched apart 
by the movement of the Northern American 
and Pacific tectonic plates, leaving the crust 
thin and hot. “Temperatures are really high,” 
Glen says. “There’s huge geothermal poten- 
tial”” Beyond mapping potential lithium de- 
posits and geothermal sites, the surveys have 
also found new faults at the southern end of 
the San Andreas, and what appear to be bur- 
ied volcanoes beneath the Salton Sea. “This 
is brand new stuff; Glen says. “We didn’t 
know any of this.” 

Those insights come from magneto- 
meter, radiometric, and laser altimeter 
flights. But Earth MRI is also planning hyper- 
spectral surveys that will scan the treeless, 
arid surface for pay dirt. Lithium and rare 
earth elements, for example, have strong 
spectral reflections; and other signatures 
can reveal the iron or clay minerals associ- 
ated with lithium or other minerals. 
Beyond prospecting, the data will 
be valuable for spotting volcanic 
hazards. Those include rocks on the 
flanks of volcanoes that have been 
altered into soft clays by melting 
snow and heat, says Bernard Hub- 
bard, a remote-sensing geologist at 
USGS. “Those become unstable— 
and then they collapse.” 


BESIDES IDENTIFYING the rock for- 
mations likely to hold mineral de- 
posits, Earth MRI has accelerated 
USGS efforts to detect valuable re- 
sources left behind in tailings from 
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defunct copper or iron mines. Last decade, 
Shah spotted the distinctive radioactive 
signatures of rare earths in such piles in 
Mineville, a hamlet in New York. With state 
geological agencies, USGS is compiling a 
national database of mine waste sites, along 
with methods for researchers to assess the 
waste’s mineral potential. “What’s the point 
of digging another hole in the ground if you 
can remine the rocks?” asks Darcy McPhee, 
Earth MRI’s program coordinator at USGS. 

Those lingering tailings piles are a re- 
minder of the environmental damage min- 
ing can do. For decades, the U.S. avoided 
environmental debates over mining by 
outsourcing it to other countries. The new 
consensus is that work should happen here, 
Ryker says. “But that means we have to deal 
with the conflict.” The survey will reveal new 
resources. But the rest is up to us, she says. 
“How much should we develop? That’s a 
much more complicated question.” 


The mineral stibnite is the ore for antimony, used in batteries. 


Those questions are now unfolding, 
state by state. In Nevada, lithium prospect- 
ing is booming, spurred by the Inflation 
Reduction Act’s mandate that electric cars 
must use some U.S.-sourced minerals for 
buyers to get a tax credit. But in Maine, 
legislators enacted a strict mining law in 
2017, when the state’s largest landowner, 
the Canadian forestry company J.D. Irving, 
considered exploiting reserves of gold, 
silver, and copper found on its lands. Fol- 
lowing the discovery of rare earth depos- 
its at Pennington Mountain and lithium 
elsewhere in the state, lawmakers are now 
considering amending the law to allow 
some responsible mining. 

Given the demands of green technology 
and the imperative to lower carbon emis- 
sions, Many environmental groups are 
softening their stance on critical-mineral 
mining, Barbanell says. This exploitation 
doesn’t have to go on forever, she adds. 
Unlike coal, which must be mined 
indefinitely as it’s burned, the min- 
erals used for batteries and wind 
turbines can almost always be 
recycled—as long as policymakers 
push for their reuse. 

Slack would also welcome some 
mining. He retired to Maine for 
its natural splendor, but until re- 
cycling can cover society’s needs, 
critical mineral exploitation needs 
to happen somewhere. “We cannot 
have a low carbon future and green 
tech without mining,’ he says. 
“It’s not an option. It’s a necessity. 
It’s essential.” 
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Embodying measurement 


Measuring with body parts is a handy and persistent 
cross-cultural phenomenon 


By Stephen Chrisomalis 


umans use multiple culturally-specific 

cognitive strategies for managing 

social and _ technical challenges. 

Measurement, the correlation of some 

target to some comparator or unit, is 

cross-culturally universal and has a 
deep history (7). Ethnographic and _histor- 
ical analyses have documented these strat- 
egies for individual societies, but generaliz- 
ing across languages and cultures is still an 
incomplete task. On page 948 of this issue, 
Kaaronen et al. (2) show that body-based 
measuring is both common worldwide and 
builds on embodied cognitive properties 
that make such practices highly suitable for 
many measuring problems. Rather than con- 
sidering standardized measures such as the 
metric system as superior, the authors argue 
that body-based measurement is often ad- 
vantageous when solving human problems 
at human scales. Assessing 186 societies, 
past and present, they show that body-based 
measuring is globally prevalent because it 
is readily available to users, ergonomically 
adaptive, and linked to local knowledge, lan- 
guage, and tasks. 

Body-based measurement is one of a suite 
of cognitive technologies for representing, 
notating, and evaluating the world. Other 
related cognitive technologies include num- 
erical notations (3), finger-counting (4), arith- 
metical devices (5), coinage (6), and writing 
systems (7). They may involve artifacts, 
the body, or even, as in the case of number 
words or color terms, be principally linguis- 
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tic. Cognitive technologies are interesting be- 
cause they organize thought, by structuring 
otherwise nebulous domains, and behavior, 
by affording practices that would otherwise 
be challenging. Crucially, cognitive technolo- 
gies are cultural—socially shared, not sim- 
ply one-off solutions by a single individual. 
Although they may not be formally standard- 
ized, cognitive technologies produce a com- 
mon lexicon for cooperative activities such as 
trade and craft production. They are embod- 
ied in individuals and embedded in language 
within and across communities. 

Kaaronen et al. use the Human Relations 
Area Files (HRAF) database of cultural ma- 
terials classified by subject to show that in 
a wide range of societies, body-based mea- 
suring systems persisted long after stan- 
dardized measures were introduced into 
various regions. This is because, for tasks 
involving the body, body-based measure- 
ments are the best solution to ergonomic 
and technical problems. 

Because many of the problems that hu- 
mans face are similar, and because human 
brains and bodies are similar, some robust 
generalizations about cognitive technologies 
can be made. For instance, the worldwide 
predominance of decimal numeral words 
is clearly related to the hands with their 10 
fingers (8). However, because there is no 
one perfect way to measure a length, and no 
one way to count on one’s fingers, cognitive 
technologies tend to sit in that interesting 
space between total cultural particularity 
and universality. Many linguistic and cul- 
tural phenomena have only a few “stable 
engineering solutions satisfying multiple 
design constraints” (9, p. 429). For example, 
there are only five basic structural principles 


underlying the world’s >100 numerical no- 
tations (such as the Roman numerals, Inka 
khipu, and modern Indo-Arabic positional 
systems), even though others are easily im- 
aginable (3). Far greater diversity has been 
observed in the world’s systems for counting 
using the fingers and the body than might 
be expected (4). Such diversity in cognitive 
technologies does not mean that anything 
can arise or that searches for patterns should 
be abandoned. Instead, it should compel fu- 
ture ethnographic and historical case studies 
into how people make decisions about which 
technologies they employ. 

Cognitive technologies provide a social 
and material structure on which to build con- 
ceptual representations. In Japan, day names 
and month names are commonly mapped 
onto the joints of the hand, creating a blended 
“material anchor” that provides a “handy” 
tool for computing the day of the week of 
any date 0). This might lead to a form of 
extended cognition in which the brain is seen 
as a necessary but insufficient part of a cogni- 
tive system (17). However, hands and fingers 
are not tools until they are conceptualized 
as tools by their users, their capacities are 
linked to a socially recognized problem, and 
a workable solution such as the span (the 
width of an open palm) or cubit (the length 
between fingertips and elbow) is developed 
and shared. In turn, these technologies can 
be internalized so that they can work even in 
the absence of the object itself. Once trained, 
users of arithmetical tools such as the 
Chinese swan pan and Japanese soroban can 
perform arithmetic using a “mental abacus,” 
a representation of an artifact that serves in 
lieu of the object itself, either on its own or 
supported by gesture (72). 

The study of cognitive technologies in ex- 
perimental conditions often suffers from the 
problem that the populations being studied 
are overly Western, educated, industrialized, 
rich, and democratic (WEIRD) (3). But an 
additional problem is that analyses might be 
biased toward modern societies. It should 
not be presumed that the cognitive tech- 
nologies of the past are identical to those of 
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An ancient Greek metrological relief from the 
mid-fifth century BCE, depicting, when complete, 
a Greek fathom (orguia) of 209 cm and a 

foot (above the arm) of 29.6 cm, matching known 
measures of the period to idealized body parts. 


the present. Even though human bodies are 
similar to those of the past, human technical 
and social needs are potentially quite differ- 
ent. Thus, it is important to understand how 
information was structured, not only in the 
fraction of human existence that can be ob- 
served most directly but also in ancient and 
premodern societies throughout the world. 
Datasets such as HRAF, used by Kaaronen 
et al., are skewed toward ethnographically 
known societies from the 19th and 20th cen- 
turies. Future studies must employ cognitive 
cross-cultural research that is sensitive to 
how technologies change across time (dia- 
chronically), not merely synchronic patterns 
in the modern world. 

Caution should also be exercised when 
investigating any cognitive technology to 
avoid assuming either that it is universal or 
that, if present, its function is universal. In a 
Eurocentric framework, it is easy to assume 
that lexical structures such as color terms 
or number words are essential tools for pro- 
cessing the information that the world pro- 
vides. But there is evidence to suggest that 
neither of these are as universal as was once 
assumed. Powerful combinations of ethno- 
graphic, experimental, and linguistic evi- 
dence reveal that the correlation between 
color technology (such as painting and dye- 
ing) and the complexity of the color lexicon 
is neither simple nor predictable across so- 
cial scale (J4). Body-based measures, simi- 
larly, are not some vestige or a cultural-evo- 
lutionary precursor, but have survived and 
thrived despite the presence of other forms 
of standardized measures, because they 
continue to be useful for societies and the 
individuals within them. 
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A new method offers high-resolution three-dimensional 
printing and low-temperature firing 
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eramics, glass ceramics, and glasses 

have a combination of properties that 

no other classes of materials (poly- 

mers and metals) display. For example, 

they have high chemical durability, 

hardness, bending strength, electri- 
cal resistivity, and transparency (for glasses). 
However, they are conventionally produced 
by sintering (consolidation of powder com- 
pacts by heating below melting point) or by 
high-temperature processing leading to a 
melt that is shaped and cooled to form a solid 
object. These forming processes require high 
temperatures and are limited by the mini- 
mum feature size achievable in a component. 
On page 960 of this issue, Bauer et al. (1) 
report that a photocurable polyhedral oligo- 
meric silsesquioxane (POSS) liquid precur- 
sor blended with a suitable acrylic oligomer 
and a photoinitiator allows the fabrication 
of highly transparent silica glass nano- and 
microstructures with high resolution using 
two-photon polymerization (TPP) followed 
by firing at low temperature. 

Alternative approaches to the high-tem- 
perature fabrication of ceramic and glass 
bulk components are based on the use of 
organic-inorganic precursors that can be con- 
verted to ceramics or glasses, after shaping, 
by a low-temperature heat treatment (..e., 
500° to 1000°C). They include sol-gel precur- 
sors (2) and preceramic polymers (3), which 
are either liquid or easily soluble in common 
solvents. Molecular sol-gel precursors enable 
a wide range of mainly metal oxide materials 
to be obtained. Preceramic polymers offer a 
more limited compositional range of just sili- 
con- or boron-containing materials, but their 
polymeric nature allows high processability. 
In both cases, a fully inorganic material can 
be produced by thermally eliminating resid- 
ual organic moieties and/or completing con- 
densation reactions to form a network made 
up of, for example, metal-oxygen bonds. 
Depending on the composition, the organic- 
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inorganic shaped body is transformed into 
a fully ceramic or glass material at low tem- 
perature, which allows cost and environmen- 
tal benefits, shorter processing times, and 
enhanced compatibility with other materials 
when fabricating multicomponent devices. 
TPP is a volumetric additive manufac- 
turing technology relying on the localized 
absorption of radiation for selective cross- 
linking of a photocurable material with 
submicrometer resolution. Attempts at pro- 
ducing silica-based structures using TPP 
demonstrated that it is possible to obtain 
defect-free glass components at a scale of a 
few hundred micrometers (4) or nanometers 
(5) by using photocurable systems containing 
colloidal silica particles. However, high-tem- 
perature sintering, either 1300° or 1100°C, 
was necessary to achieve properties similar 
to those of pure silica glass (fused silica). The 
presence of colloidal particles complicates 
the three-dimensional (3D) printing process 
that is used to shape the material owing to 
scattering, which limits the resolution of the 
printed features. An all-liquid, particle-free 
precursor system, such as the one proposed 
by Bauer et al., does not suffer from these 
drawbacks. However, efforts to process sol- 
gel solutions by TPP have mostly concen- 
trated on producing only unfired parts—thus 
developing organic-inorganic components 
(6) that do not have the range of favorable 
properties of an inorganic glass or ceramic. 
Using preceramic polymers, such as silox- 
anes, silicon oxycarbide (SiOC) parts were 
produced, although at a slightly lower resolu- 
tion than that reported by Bauer et al. and 
lacking transparency owing to the presence 
of residual free carbon in the glass structure 
(7-9). These siloxane precursors contain a 
large amount of carbon-containing moieties 
that renders them poorly suited to the fabri- 
cation of pure, transparent silica glass bodies 
because the exothermal oxidation occurring 
when firing them in air typically leads to the 
formation of microcracks. In a similar ap- 
proach to that of Bauer et al., a precondensed 
liquid silicone resin to which a silane acry- 
late was chemically bonded enabled a low- 
carbon-containing liquid precursor to be ob- 
tained that was photocurable and printable 
by TPP (10). This was converted to silica glass 
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An ancient Greek metrological relief from the 
mid-fifth century BCE, depicting, when complete, 
a Greek fathom (orguia) of 209 cm and a 

foot (above the arm) of 29.6 cm, matching known 
measures of the period to idealized body parts. 


the present. Even though human bodies are 
similar to those of the past, human technical 
and social needs are potentially quite differ- 
ent. Thus, it is important to understand how 
information was structured, not only in the 
fraction of human existence that can be ob- 
served most directly but also in ancient and 
premodern societies throughout the world. 
Datasets such as HRAF, used by Kaaronen 
et al., are skewed toward ethnographically 
known societies from the 19th and 20th cen- 
turies. Future studies must employ cognitive 
cross-cultural research that is sensitive to 
how technologies change across time (dia- 
chronically), not merely synchronic patterns 
in the modern world. 

Caution should also be exercised when 
investigating any cognitive technology to 
avoid assuming either that it is universal or 
that, if present, its function is universal. In a 
Eurocentric framework, it is easy to assume 
that lexical structures such as color terms 
or number words are essential tools for pro- 
cessing the information that the world pro- 
vides. But there is evidence to suggest that 
neither of these are as universal as was once 
assumed. Powerful combinations of ethno- 
graphic, experimental, and linguistic evi- 
dence reveal that the correlation between 
color technology (such as painting and dye- 
ing) and the complexity of the color lexicon 
is neither simple nor predictable across so- 
cial scale (J4). Body-based measures, simi- 
larly, are not some vestige or a cultural-evo- 
lutionary precursor, but have survived and 
thrived despite the presence of other forms 
of standardized measures, because they 
continue to be useful for societies and the 
individuals within them. 
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inorganic shaped body is transformed into 
a fully ceramic or glass material at low tem- 
perature, which allows cost and environmen- 
tal benefits, shorter processing times, and 
enhanced compatibility with other materials 
when fabricating multicomponent devices. 
TPP is a volumetric additive manufac- 
turing technology relying on the localized 
absorption of radiation for selective cross- 
linking of a photocurable material with 
submicrometer resolution. Attempts at pro- 
ducing silica-based structures using TPP 
demonstrated that it is possible to obtain 
defect-free glass components at a scale of a 
few hundred micrometers (4) or nanometers 
(5) by using photocurable systems containing 
colloidal silica particles. However, high-tem- 
perature sintering, either 1300° or 1100°C, 
was necessary to achieve properties similar 
to those of pure silica glass (fused silica). The 
presence of colloidal particles complicates 
the three-dimensional (3D) printing process 
that is used to shape the material owing to 
scattering, which limits the resolution of the 
printed features. An all-liquid, particle-free 
precursor system, such as the one proposed 
by Bauer et al., does not suffer from these 
drawbacks. However, efforts to process sol- 
gel solutions by TPP have mostly concen- 
trated on producing only unfired parts—thus 
developing organic-inorganic components 
(6) that do not have the range of favorable 
properties of an inorganic glass or ceramic. 
Using preceramic polymers, such as silox- 
anes, silicon oxycarbide (SiOC) parts were 
produced, although at a slightly lower resolu- 
tion than that reported by Bauer et al. and 
lacking transparency owing to the presence 
of residual free carbon in the glass structure 
(7-9). These siloxane precursors contain a 
large amount of carbon-containing moieties 
that renders them poorly suited to the fabri- 
cation of pure, transparent silica glass bodies 
because the exothermal oxidation occurring 
when firing them in air typically leads to the 
formation of microcracks. In a similar ap- 
proach to that of Bauer et al., a precondensed 
liquid silicone resin to which a silane acry- 
late was chemically bonded enabled a low- 
carbon-containing liquid precursor to be ob- 
tained that was photocurable and printable 
by TPP (10). This was converted to silica glass 
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micro-optic components after firing in air at 
600° to 1000°C, with a refractive index for the 
material heated at 600°C similar to that of 
fused silica. However, heating to 1000°C led 
to a further ~4% shrinkage, indicating that 
the material was not yet fully dense. Bauer 
et al. developed a photocurable blend based 
on a POSS that achieved ~100-nm resolution 
with TPP and demonstrated that, owing to its 
densely packed cage structure, a heat treat- 
ment at 650°C allowed characteristics that 
were virtually indistinguishable from those 
of fused silica. Notably, the transparency 
of the printed parts enabled fabrication of 
micro-optical elements with high smooth- 
ness and optical performance. 

TPP is already used to generate micro-opti- 
cal elements with unlimited design freedom 
compared with conventional techniques. 
However, its commercial application has so 
far been limited to polymers. The results of 
Bauer et al. pave the way for implementing 
glass micro-optics for more demanding tem- 
peratures and environments. Furthermore, 
the increased durability would also benefit 
the field of microfluidics. Glass microfluidic 
devices are a necessity when aggressive, reac- 
tive, and flammable fluids have to be injected 
(often at high pressures and temperatures), 
such as for the investigation of carbon di- 
oxide (CO,) sequestration and hydrocarbon 
recovery operations. The current multistep 
fabrication processes are complex and in- 
volve the use of chemically reactive plasma 
(reactive ion etching) or liquid chemicals 
(wet etching) to selectively remove the mate- 
rial from a glass wafer. TPP can be a simpler, 
more sustainable alternative, and it opens up 
new 3D designs that are impossible with cur- 
rent techniques. 

The limited firing temperature require- 
ment of the approach demonstrated by Bauer 
et al. allows in principle for the fabrication 
of miniaturized devices (e.g., individual mi- 
crolenses or arrays) directly onto substrates, 
such as optical fibers and chips, which could 
enable process automation and high pre- 
cision. However, the considerable linear 
shrinkage (by ~40%) that occurs upon heat 
treatment might limit the coupling with dif- 
ferent materials. 
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irtually all mammalian physiologi- 

cal functions fall under the control 

of an internal circadian rhythm, or 

body clock. This circadian rhythm is 

governed by master neural networks 

in the hypothalamus that synchro- 
nize the activity of peripheral clocks in cells 
throughout the body (J). Environmental 
perturbations that are a regular part of 
modern life, such as artificial light and in- 
ternational travel, can disrupt circadian 
rhythms, leading to adverse consequences 
for mental and physical health (2). On page 
972 of this issue, Tu et al. (3) report that 
primary cilia-mediated Sonic Hedgehog 
(SHH) signaling allows cells in the master 
circadian clock to maintain synchronization 
and control circadian rhythmicity in mice, 
identifying an unexpected functional role 
for this developmental regulator. 

The master circadian pacemaker re- 
sponsible for regulating our daily rhythms 
is located in the suprachiasmatic nucleus 
(SCN) in the anterior hypothalamus. The 
cells that make up this pacemaker main- 
tain intercellular coupling of molecular 
circadian rhythms, ensuring synchrony of 
SCN neurons. Robust clocks keep time us- 
ing redundant mechanisms, and the SCN 
is no exception. Signals that promote cel- 
lular synchrony include paracrine signal- 
ing by fast neurotransmitters and multiple 
neuropeptides as well as gap junction- 
dependent electrical coupling (4). This cel- 
lular synchrony ensures the robust output 
of the central clock and renders it resistant 
to signals that reset peripheral clocks. 

Primary cilia are elongated organelles 
that are expressed on the surface of many 
cell types, including neurons. They act as 
mechanosensors and also function as or- 
ganizing centers for transducing a broad 
range of extracellular signals. Receptors 
and signal transduction proteins are trans- 


ported between the base and tip of the 
cilium by intraflagellar transport (IFT) (5), 
leading to the assembly of multiprotein 
complexes that contain receptors for many 
secreted factors. These include Smoothened 
(SMO) co-receptors for SHH signaling dur- 
ing development, as well as many G pro- 
tein-coupled receptors (6). The assembly of 
primary cilia is regulated by the cell cycle 
during development, and defects in cilia as- 
sembly lead to a range of syndromic human 
genetic disorders called ciliopathies (7). 

Tu et al. demonstrate that primary cilia- 
dependent SHH signaling in adult SCN 
neuronal populations expressing the neuro- 
peptide neuromedin S (NMS*) plays a criti- 
cal role in regulating circadian rhythms by 
maintaining intercellular coupling of cellular 
oscillators. These NMS* neurons, which are 
crucial for SCN synchrony (8), exhibit circa- 
dian phase-dependent changes in cilia num- 
ber and length, unlike other tissues. Selective 
deletion of IFT genes in NMS* neurons led 
to rapid light-induced shifts in molecular 
rhythms, accelerated activity shifts in jet 
lag-like conditions, and reduced intercellu- 
lar coupling in the SCN. These findings mir- 
ror disruptions in neuropeptide signaling or 
NMS* neuronal function. Tu e¢ al. found that 
these defects in circadian rhythmicity upon 
disruption of primary cilia in NMS* neurons 
are, at least in part, due to defects in SHH 
signaling. SHH signaling activated by SMO 
in the cilia of NMS* neurons exhibits circa- 
dian rhythmicity. Blocking SHH signaling 
in NMS* neurons phenocopied the cellular, 
molecular, and behavioral defects that are 
observed after the disruption of IFT genes. 
Crucially, all SHH-dependent effects on cir- 
cadian clock function were dependent on the 
expression of IFT genes (see the figure). 

The demonstration of a central role for 
SHH signaling in controlling central clock 
function in mice is surprising because SHH 
has been almost exclusively studied in the 
context of development. SHH is essential 
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micro-optic components after firing in air at 
600° to 1000°C, with a refractive index for the 
material heated at 600°C similar to that of 
fused silica. However, heating to 1000°C led 
to a further ~4% shrinkage, indicating that 
the material was not yet fully dense. Bauer 
et al. developed a photocurable blend based 
on a POSS that achieved ~100-nm resolution 
with TPP and demonstrated that, owing to its 
densely packed cage structure, a heat treat- 
ment at 650°C allowed characteristics that 
were virtually indistinguishable from those 
of fused silica. Notably, the transparency 
of the printed parts enabled fabrication of 
micro-optical elements with high smooth- 
ness and optical performance. 

TPP is already used to generate micro-opti- 
cal elements with unlimited design freedom 
compared with conventional techniques. 
However, its commercial application has so 
far been limited to polymers. The results of 
Bauer et al. pave the way for implementing 
glass micro-optics for more demanding tem- 
peratures and environments. Furthermore, 
the increased durability would also benefit 
the field of microfluidics. Glass microfluidic 
devices are a necessity when aggressive, reac- 
tive, and flammable fluids have to be injected 
(often at high pressures and temperatures), 
such as for the investigation of carbon di- 
oxide (CO,) sequestration and hydrocarbon 
recovery operations. The current multistep 
fabrication processes are complex and in- 
volve the use of chemically reactive plasma 
(reactive ion etching) or liquid chemicals 
(wet etching) to selectively remove the mate- 
rial from a glass wafer. TPP can be a simpler, 
more sustainable alternative, and it opens up 
new 3D designs that are impossible with cur- 
rent techniques. 

The limited firing temperature require- 
ment of the approach demonstrated by Bauer 
et al. allows in principle for the fabrication 
of miniaturized devices (e.g., individual mi- 
crolenses or arrays) directly onto substrates, 
such as optical fibers and chips, which could 
enable process automation and high pre- 
cision. However, the considerable linear 
shrinkage (by ~40%) that occurs upon heat 
treatment might limit the coupling with dif- 
ferent materials. 
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for the development and specification of 
many brain structures during embryogen- 
esis, including the SCN (9), and it also regu- 
lates axonal targeting, dendrite formation, 
and synaptogenesis (10). An ongoing role 
for SHH signaling in the adult SCN raises 
several important questions. It is unclear 
what cells are the relevant source of SHH or 
how its synthesis and release are regulated. 
Primary cilia regulate many other classes of 
extracellular signaling—such as Notch, Wnt, 
Hippo, and mammalian target of rapamycin 
(mTOR) pathways—often through receptor- 
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Regulating cilia signaling 
throughout the day 

The master circadian pacemaker in the 
suprachiasmatic nucleus (SCN) contains neuromedin 
S-expressing (NMS*) neurons that have primary 
cilia. The number and length of these cilia change 
throughout the day, which alters Sonic Hedgehog 
(SHH) signaling through Smoothened (SMO) 
co-receptors expressed on the cilia. When this 
signaling is disrupted, the cellular oscillators in the 
SCN become uncoupled, which affects circadian 
rhythmicity in mice. 
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independent mechanisms. Thus, it is un- 
clear whether other extrinsic factors might 
contribute to controlling SCN function. 
Additionally, the study of Tu et al. suggests 
that at least a subset of ciliopathy patients 
may show circadian defects that resemble 
those seen in IFT mutant mice. Because cil- 
iopathies are syndromic and often present 
with blindness and intellectual disability 
(11), these more obvious phenotypes may 
have masked defects in clock function that 
should be readily detectable with appropri- 
ate behavioral studies. 

Although components of the SHH sig- 
naling pathway are expressed in mature 
neurons (12), the study of Tu et al. provides 
convincing evidence that SHH actively reg- 
ulates neuronal activity independently of 
its well-characterized developmental func- 
tions. This suggests that SHH signaling may 
be more broadly important in controlling 
neuronal function and identifies potential 
mechanisms by which ciliopathies could dis- 
rupt brain function. However, this discovery 
raises some important caveats about the po- 
tentially exciting translational applications. 
In the study of Tu et al., pharmacological 
antagonists of SHH reduce coupling among 
SCN neurons and render them susceptible 
to rapid resetting by light, whereas SHH 
agonists enhance cellular synchronization 
in mice. Although this suggests that drugs 
targeting SHH signaling could be relevant to 
conditions ranging from travel-induced jet 
lag to aging-related sleep disorders (13, 14), it 
also raises the possibility of broad and unex- 
pected side effects. Therefore, caution must 
be exercised in the development and appli- 
cation of SHH-targeting drugs. Nonetheless, 
the work of Tu et al. provides critical new 
insight into how central clock function is 
regulated and demonstrates an unexpected 
role for a key regulator of brain development 
in controlling neuronal function in adults. 
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Multiorgan imaging unveils 
the intertwined nature of the 
human heart and brain 


By Julia Sacher?? and A. Veronica Witte"? 


ig brain-mapping initiatives such as 
UK Biobank (J), NeuroCharge (2), 
and Enigma (3) are transforming the 
methods used to explore neurosci- 
ence. However, the focus on imaging 
the brain as a singular entity often 
ignores the intricate interplay with the rest 
of the body. On page 934 of this issue, Zhao 
et al. (4) delve into multiorgan magnetic 
resonance imaging (MRI) data from over 
40,000 individuals to examine the connec- 
tion between heart traits and measures of 
brain structure and function. They identify 
multiple genetic links between distinct as- 
pects of cardiovascular function and brain 
health. By offering a multidimensional 
analysis of heart-brain connections, this 
study could contribute to the development 
of personalized disease risk prediction. 
Zhao et al. used deep machine learning, a 
type of artificial intelligence (AI), to analyze 
cardiac MRI phenotypes. This approach 
allows for advanced complex modeling of 
health-related and disease-related metrics. 
They extracted 82 cardiac and aortic traits, 
such as mass, area, volume, wall thickness 
and pumping efficiency, that correlated with 
clinical markers of heart anatomy, function, 
and health (5). Specific heart traits covaried 
with specific MRI-derived brain traits. For 
example, greater myocardial wall thickness 
was associated with larger subcortical brain 
volumes, and smaller distal aortic area was 
associated with differences in (pre)frontal 
and hippocampal volumes, and with lower 
global and regional measures of white 
matter microstructural coherence (see the 
figure). In addition, Zhao et al. reported 
heritability and genome-wide associations 
for the heart traits, which included loci as- 
sociated with complex body and brain traits 
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for the development and specification of 
many brain structures during embryogen- 
esis, including the SCN (9), and it also regu- 
lates axonal targeting, dendrite formation, 
and synaptogenesis (10). An ongoing role 
for SHH signaling in the adult SCN raises 
several important questions. It is unclear 
what cells are the relevant source of SHH or 
how its synthesis and release are regulated. 
Primary cilia regulate many other classes of 
extracellular signaling—such as Notch, Wnt, 
Hippo, and mammalian target of rapamycin 
(mTOR) pathways—often through receptor- 
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co-receptors expressed on the cilia. When this 
signaling is disrupted, the cellular oscillators in the 
SCN become uncoupled, which affects circadian 
rhythmicity in mice. 
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independent mechanisms. Thus, it is un- 
clear whether other extrinsic factors might 
contribute to controlling SCN function. 
Additionally, the study of Tu et al. suggests 
that at least a subset of ciliopathy patients 
may show circadian defects that resemble 
those seen in IFT mutant mice. Because cil- 
iopathies are syndromic and often present 
with blindness and intellectual disability 
(11), these more obvious phenotypes may 
have masked defects in clock function that 
should be readily detectable with appropri- 
ate behavioral studies. 

Although components of the SHH sig- 
naling pathway are expressed in mature 
neurons (12), the study of Tu et al. provides 
convincing evidence that SHH actively reg- 
ulates neuronal activity independently of 
its well-characterized developmental func- 
tions. This suggests that SHH signaling may 
be more broadly important in controlling 
neuronal function and identifies potential 
mechanisms by which ciliopathies could dis- 
rupt brain function. However, this discovery 
raises some important caveats about the po- 
tentially exciting translational applications. 
In the study of Tu et al., pharmacological 
antagonists of SHH reduce coupling among 
SCN neurons and render them susceptible 
to rapid resetting by light, whereas SHH 
agonists enhance cellular synchronization 
in mice. Although this suggests that drugs 
targeting SHH signaling could be relevant to 
conditions ranging from travel-induced jet 
lag to aging-related sleep disorders (13, 14), it 
also raises the possibility of broad and unex- 
pected side effects. Therefore, caution must 
be exercised in the development and appli- 
cation of SHH-targeting drugs. Nonetheless, 
the work of Tu et al. provides critical new 
insight into how central clock function is 
regulated and demonstrates an unexpected 
role for a key regulator of brain development 
in controlling neuronal function in adults. 
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ig brain-mapping initiatives such as 
UK Biobank (J), NeuroCharge (2), 
and Enigma (3) are transforming the 
methods used to explore neurosci- 
ence. However, the focus on imaging 
the brain as a singular entity often 
ignores the intricate interplay with the rest 
of the body. On page 934 of this issue, Zhao 
et al. (4) delve into multiorgan magnetic 
resonance imaging (MRI) data from over 
40,000 individuals to examine the connec- 
tion between heart traits and measures of 
brain structure and function. They identify 
multiple genetic links between distinct as- 
pects of cardiovascular function and brain 
health. By offering a multidimensional 
analysis of heart-brain connections, this 
study could contribute to the development 
of personalized disease risk prediction. 
Zhao et al. used deep machine learning, a 
type of artificial intelligence (AI), to analyze 
cardiac MRI phenotypes. This approach 
allows for advanced complex modeling of 
health-related and disease-related metrics. 
They extracted 82 cardiac and aortic traits, 
such as mass, area, volume, wall thickness 
and pumping efficiency, that correlated with 
clinical markers of heart anatomy, function, 
and health (5). Specific heart traits covaried 
with specific MRI-derived brain traits. For 
example, greater myocardial wall thickness 
was associated with larger subcortical brain 
volumes, and smaller distal aortic area was 
associated with differences in (pre)frontal 
and hippocampal volumes, and with lower 
global and regional measures of white 
matter microstructural coherence (see the 
figure). In addition, Zhao et al. reported 
heritability and genome-wide associations 
for the heart traits, which included loci as- 
sociated with complex body and brain traits 
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Brain features correlate with heart traits 
Brain regions in which magnetic resonance imaging features showed statistically significant associations with heart morphology are color-coded according to the degree 
of significance. (Left) Thicker muscle walls of the heart were associated with larger gray matter volumes of subcortical regions including the thalamus, caudate, putamen, 
hippocampus, and amygdala. (Middle) The size of the distal aortic area was associated with gray matter volume of the prefrontal cortex and hippocampus. (Right) A 
smaller distal aortic area was associated with higher fractional anisotropy, a measure of the coherence of connecting fiber tracts, in major white matter tracts including 


the corpus callosum, corona radiata, and uncinate fasciculus. 
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and diseases such as stroke, dementia, 
Parkinson’s disease, schizophrenia, bipolar 
disorder, and eating disorders. 

Zhao et al. also leveraged known ge- 
netic underpinnings of disease to test for 
a causal relationship between heart and 
brain traits. They found that neurologi- 
cal diseases were significantly more com- 
mon among participants with genetic risk 
of specific aortic traits than in partici- 
pants without those risk alleles. Similarly, 
changes to left ventricular radial strain 
(heart pump efficiency) were more com- 
mon among participants carrying genetic 
variants associated with risk of sleep ap- 
nea than noncarriers. On the basis of their 
findings, Zhao et al. propose distinct, re- 
ciprocal pathways between heart and brain 
function and suggest that these pathways 
could be exploited to provide biomarkers 
of disease risk and progression. However, 
several methodological and conceptual 
challenges will need to be addressed in fu- 
ture studies. 

Multivariate analyses such as those of 
Zhao et al. involve a myriad of data points 
and nearly endless possible combinations 
of variables, which means that establish- 
ing meaningful relationships can be com- 
plicated by confounders, negligible effect 
sizes, and multiple testing (6). Zhao et al. 
enhanced the reliability of their reported 
results in several ways: They took into con- 
sideration a well-thought-out set of pos- 
sible confounders, applied rigorous thresh- 
olds to define statistical significance, and 
ran replication analyses in a separate data- 
set. Future studies will need to use compu- 
tational analysis to gauge the relative level 
of support that data offer for competing 
hypotheses (7), thus overcoming some of 
the limitations. 
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The fine-grained analysis of cardiac 
MRI data by Zhao et al. was possible be- 
cause of their use of vision-inspired AI. 
Recent advances in such methods could 
revolutionize medical image analysis by 
predicting an individual’s diagnosis with 
unprecedented accuracy (8). Therefore, 
future studies could make further use of 
cardiac and brain MRI datasets by training 
AI algorithms to predict disease progres- 
sion from raw or preprocessed images, per- 
haps in combination with phenotypic data. 
However, whether the neural networks un- 
derlying these AI predictions rely on bio- 
logically meaningful features of the images 
and not, for example, on image artifacts, 
unknown data manipulation, or acquisi- 
tion bias is often not clear. This limitation 
might explain a lack of clinical implemen- 
tation so far, but explainable AI tools (9) 
can help solve these issues. 

Unfortunately, AI tools often reflect the 
discriminative nature of society against 
specific groups such as non-white indi- 
viduals and women. The reason for AI bi- 
ases lies in the information used to train 
them. Although an estimated 14% of UK 
residents have non-white ancestry, the 
vast majority of UK Biobank participants 
are white. The main analysis by Zhao et al. 
only included participants of white ances- 
try, which has been the default for many 
genetic studies. This means that an op- 
portunity to sample all available variance 
is missed. Application of AI tools without 
inclusive action will continue to exacer- 
bate race and gender bias in biomedical 
sciences and likely result in a reinforce- 
ment of health disadvantages for most of 
the world’s population—namely those who 
are not of Western, educated, industrial- 
ized, rich, and democratic (WEIRD) origin. 


External capsule 


Internal capsule 


Commendably, Zhao et al. report sex- 
stratified analyses, showing that the 
strength of some heart-brain correla- 
tions differed between females and males. 
Given sex differences in cardiovascular 
and neurodegenerative diseases, includ- 
ing worse outcome after stroke and higher 
rates of Alzheimer’s disease in females (10, 
11), these kinds of analyses are essential. 
Future studies, however, should investigate 
the effect of gender as a social construct on 
the heart-brain relationship. 

For many common noncommunicable 
diseases, the first subclinical signs of dis- 
ease precede the onset of severe symptoms 
by several decades. Therefore, leveraging 
multiorgan imaging data with genetic in- 
formation to improve individualized detec- 
tion of early biomarkers for cardiovascular 
and brain disease could provide an op- 
portunity for highly effective intervention. 
True advances, however, will only be pos- 
sible if previously underserved communi- 
ties are not left out. 
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Reforming regulation with an eye toward equity 


The Biden administration seeks to change how agencies weigh the effects of regulation 


By Robert W. Hahn? 


n many jurisdictions around the world, 
a primary way that scientific and techni- 
cal knowledge and expertise can influ- 
ence society is through government 
regulation. Government agencies regu- 
larly conduct technical analyses of pro- 
posed regulations, which can influence 
whether and how a regulation is imple- 
mented. In the United States, the Biden ad- 
ministration recently proposed some of the 
most dramatic attempts to modernize US 
federal regulatory analysis in decades. 
These reforms, which are broadly consis- 
tent with the president’s objectives of pro- 
moting equity and addressing climate 
change, could substantially change how 
regulatory oversight is performed at the 
federal level and how benefits and costs are 
calculated. Although mostly in draft form 
[and open for public comments (J)], these 
reforms, if implemented, could have lasting 
effects on how regulation affects economic 
growth, on the winners and losers from 
regulatory activity, and on how the United 
States and other countries respond to long- 
term challenges—notably, climate change. 
Since 1981, US federal regulatory over- 
sight has required weighing the benefits 
and costs of major federal regulations, typi- 
cally focused on environmental, health, and 
safety regulation. The presidential execu- 
tive orders that have governed regulatory 
oversight across all administrations during 
this period share some common themes. 
For example, they ask that both costs and 
benefits be monetized by using standard 
economic techniques. Furthermore, the 
executive orders generally ask agencies to 
evaluate and select the regulatory alterna- 
tive that maximizes net benefits (defined as 
the difference between benefits and costs) 
subject to various concerns such as legal 
feasibility and equity. The orders also gener- 
ally recognize that not all benefits and costs 
can be quantified and ask for a discussion 
of unquantifiable benefits and costs when 
they may be important. 
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The orders have been influenced by 
economic principles. Economists gener- 
ally agree that government regulation can 
have an important impact on growth and 
the well-being of consumers. There is also 
agreement that benefit-cost analysis can 
be a useful tool for informing regulatory 
decision-making, especially for large regula- 
tions involving billions of dollars of society’s 
resources (2). The annual cost of the regula- 
tions that are reviewed has been estimated 
to be in the hundreds of billions of dollars, 
with benefits estimated to be a similar or- 
der of magnitude. 


THE PROPOSED REFORMS 

The Biden administration has shown a 
strong interest in modernizing regulatory 
oversight, having issued a memo on this 
topic on the president’s first day in office 
(1). The current proposal to modernize the 
regulatory review process would retain an 
important role for benefit-cost analysis but 
would place much greater importance on 
identifying the winners and losers from reg- 
ulatory activity (and how much they won or 
lost) than had previous administrations. 

There are five key changes that the ad- 
ministration is considering or in the process 
of implementing, two of them contained in 
the executive order “Modernizing Regula- 
tory Review” and three proposed changes 
to “Circular A-4,” which is a draft technical 
document from the Office of Management 
and Budget (OMB) that provides guidelines 
to agencies on how they should do benefit- 
cost analysis (J). The reason for separating 
these changes is that executive orders are 
typically reviewed by each new administra- 
tion. By contrast, the last time Circular A-4 
was modified was in 2003. 

The two big changes in the executive or- 
der relate to broader public participation, 
and the dollar cutoff point for when a for- 
mal benefit-cost analysis is required. First, 
the executive order attempts to encourage 
greater public participation to promote in- 
clusive regulatory policy (J). This change 
can be interpreted as building on attempts 
by earlier administrations to be more in- 
clusive. For example, the Clinton executive 
order calls for maximizing consultation 
involving the public and state, local, and 


tribal officials. It is not clear how the Biden 
administration will build on such efforts or 
whether it will succeed. Researchers have 
an opportunity to help shape and study the 
effectiveness of this process (3). It appears 
that these efforts will be targeted toward 
underserved communities. The administra- 
tion may also want to think about targeting 
general consumers, who may not be well 
represented because the costs of regulation 
on consumers may not be substantial on a 
per person basis but may be substantial in 
the aggregate. 

Second, the executive order changes the 
threshold for a “significant” regulation re- 
quiring formal benefit-cost analysis from 
a $100 million annual economic impact 
to $200 million. This change reflects an 
update for inflation and economic growth 
that has occurred and is unlikely to be con- 
troversial. It may help OMB and regulatory 
agencies focus their resources on the most 
important regulations. 

There are three big changes in Circular 
A-4 on how agencies should do benefit- 
cost analyses. These concern who should 
be counted in a benefit-cost analysis; how 
different groups, such as high-income and 
low-income groups, should be weighed; and 
what discount rate should be used to com- 
pare future benefits and costs with current 
benefits and costs (1). 

First, there is a substantial change con- 
cerning who should be counted in a benefit- 
cost analysis. There has been widespread 
agreement that US citizens and residents of . 
the country should be counted in a benefit- 
cost analysis being reviewed by OMB. This 
is because such policies are supposed to ad- 
vance US interests in the sense of increasing 
net benefits for US consumers. The draft of 
Circular A-4 expands this idea to include all 
citizens of the world in some contexts. For 
example, if it could be argued that US action 
on climate change would promote greater 
international cooperation, this could provide 
a rationale for counting net benefits to all 
citizens of the world, rather than focusing 
on benefits and costs to US residents and 
citizens. Although this provides a plausible 
rationale for considering all citizens of the 
world, Circular A-4 does not appear to con- 
sider the possibility that when the US does 


2 JUNE 2023 * VOL 380 ISSUE 6648 899 


INSIGHTS | POLICY FORUM 


more to provide a global public good, such as 
the reduction of greenhouse gas emissions, 
other countries may do less. 

Although the Circular A-4 draft does 
not require agencies to take a global per- 
spective, the impact could be consider- 
able if they did. For example, consider the 
social cost of carbon, which is the value 
of reducing a ton of carbon in the atmo- 
sphere. The US social cost of carbon has 
been estimated to be about 10% of the 
global social cost of carbon (4). This sug- 
gests that the choice of using a global or 
domestic social cost of carbon in benefit- 
cost analyses for regulatory activities 
could make a big difference in selecting 
policies that maximize net benefits (5, 6). 

Second, the administration proposes to 
make a quantitative change to 
promote equity. It notes that 
a low-income person may get 
more satisfaction or happi- 
ness from an additional dollar = 
than a high-income person. 

Thus, different welfare weights 

might be applied to these indi- 
viduals in measuring costs and 
benefits (7). The application of 

such welfare weights is becom- 2 
ing more widely used in both 
academic research and policy 
circles, but it is not yet broadly 
applied (8). Part of the problem 
lies in agreeing on what the 
precise weights should be. The 
proposed change would be con- 
sistent with what was discussed 
in earlier policy documents that 
govern regulation outside of 
the United States, such as the 
“Green Book” in the UK (9). 

What is different in the US 0 
context is that OMB introduces 
a formula to compute welfare 
weights. Although OMB does not require 
that agencies use this formula, their choos- 
ing to do so could have a big impact on 
policies that are selected. According to 
OMB’s guidance, the welfare weight varies 
as a percent of median annual household 
income, with the welfare weight decreas- 
ing as income increases (see the figure). 
The welfare weight at the median annual 
household income, which in the United 
States is about $71,000, is set to 1. A family 
that has an income of 50% of the median 
would count 2.6 times as much as the me- 
dian family in the benefit-cost analysis. By 
contrast, a family that has an income 50% 
more than the median would count 57% as 
much as the median family in the analysis. 
In the extreme case in which a family has 
zero income, the welfare weight is infinite, 
which defies common sense. 


Welfare Weight 
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Introducing these welfare weights could 
justify regulations whose primary focus is 
the redistribution of wealth. For example, 
transferring a dollar from a group that is 
150% above the median income to one that 
is 50% below the median would result in 
net benefits of $2.03 ($2.60 minus $0.57). 
Moreover, using this weighting can lead to 
a different ordering of the net benefits of 
programs compared with a standard benefit- 
cost analysis, which uses a welfare weight of 
one for all groups. In the context of climate 
change, applying such weights while also ex- 
panding analysis to consider benefits accru- 
ing to those outside the United States would 
imply that the net benefits to low-income 
developing countries would be weighted 
more highly than net benefits accruing to US 


Welfare weight increases as income decreases 


The curve reflects the equation 
in proposed Circular A-4 (p. 65) 
for determining welfare weights: 
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citizens. Such a weighting scheme, although 
it may align with what many consider a just 
approach to addressing historical impacts 
of US actions on the rest of the world, could 
nonetheless raise political concerns. 

Third, the recommended discount rate, 
which converts estimates of future mon- 
etized benefits and costs into current ben- 
efits and costs, is set substantially lower. 
The 2003 version of Circular A-4 advised 
using discount rates of 3 and 7%, with 7% 
suggested for the base case. The 3% num- 
ber was estimated on the basis of the real 
rate of return on US long-term government 
debt. Following the same approach used to 
estimate the original 3% number, but using 
the last 30 years of data, the draft Circular 
A-4 suggests using a real discount rate of 
1.7%. [The draft Circular A-4 also suggests 
accounting for the impacts of a regula- 


tion on capital formation by using a dif- 
ferent, and more economically defensible, 
approach to measuring impacts than was 
used in the 2003 Circular A-4 (J0)]. To see 
how this change might matter, consider a 
$1 benefit that accrues in 10 years from an 
investment. With a 1.7% rate, that dollar 
would be worth $0.84 today; at a 3% rate, 
that dollar would be worth $0.74 today; and 
at a 7% rate, that dollar would be worth 
$0.51 today. The lower the discount rate, 
the more the future benefits will be worth 
in today’s dollars, meaning projects or regu- 
lations that deliver those future benefits 
will look more attractive. Because nearly 
all regulations have upfront costs and de- 
liver benefits over time, the upshot of this 
proposed change is to make regulation look 
more attractive in terms of a 
benefit-cost test. This is par- 
ticularly true for issues, such as 
climate change, in which ben- 
efit streams accrue over a long 
time period and may increase 
over time. For example, a lower 
discount rate would tend to in- 
crease the social cost of carbon, 
which would make regulations 
that reduce carbon dioxide 
emissions or store carbon diox- 
ide more attractive. 

The impacts of the five key 
changes listed above are dif- 
ficult to gauge. Assuming that 
they are implemented, in the 
short term agencies are likely 
to place more efforts in meeting 
with the intended beneficiaries 
of regulation. We are likely to 
see more efforts aimed at dis- 
tributional analysis, which up 
to this point has been the ex- 
ception rather than the rule (JJ, 
12). This could lead to better-in- 
formed regulation because decision-makers 
may use more disaggregated information 
on the benefits and costs of particular regu- 
lations. We will also likely see the passage of 
more stringent regulations tamping down 
on greenhouse gas emissions because of 
the use of lower discount rates. However, 
there will inevitably be trade-offs. It is likely 
that this administration will trade-off nar- 
row efficiency objectives (defined in terms 
of a traditional benefit-cost test) against 
broader concerns with redistribution be- 
cause of its focus on equity. Over a decade 
or two, it is difficult to know which of these 
reforms will last. According to history, re- 
forms in Circular A-4 may be more durable. 


HOW RESEARCHERS CAN HELP 
There are many ways in which academics 
might support the effort to modernize reg- 
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ulatory review. A good place to start is the 
basic policy framework. There has been a 
“benefit-cost analysis consensus” among 
economists in the sense of using benefit- 
cost analysis as a key input, if not the key 
input, in regulatory decision-making (2). 
It is worth asking, as the Biden adminis- 
tration does, whether other issues such as 
equity should be included and, if so, how. 
For example, should equity analyses be 
kept separate from standard benefit-cost 
analyses? In the current proposal, equity 
concerns could be included in benefit-cost 
analyses through the use of equity weights 
but can also be addressed separately with 
their own measures. Separating equity 
analyses has the advantage of retaining a 
clear benchmark for comparison for the 
benefit-cost analysis while providing po- 
tentially useful information for decision- 
makers on potential winners and losers 
from a policy. 

The current regulatory proposals would 
evaluate equity impacts at the level of an in- 
dividual regulation, but this may not be the 
best approach from a societal point of view. 
One might argue with respect to income 
groups that ideally, it would be better to 
evaluate the distributional impacts of regu- 
lation for all regulations passed in a given 
year, or even consider the distributional 
impacts of all government policies (such as 
taxes, subsidies, and regulation). “Losers” 
on some policies may be “winners” on oth- 
ers; thus, reviewing equity impacts at the 
level of individual policies could mean that 
regulatory interventions achieve lower net 
benefits (as measured with conventional 
benefit-cost analysis) than they would if the 
distributional impacts of such interventions 
were considered at a higher level of aggre- 
gation (an overall smaller “pie” with the 
same level of redistribution). 

Another area relates to modeling how 
policy gets made. Totally absent from the 
OMB discussion is the notion that agencies 
are not disinterested players in the policy 
process [Breyer suggests that agencies may 
have “tunnel vision” (J3)]. For example, 
agencies may have a tendency to overstate 
benefits or understate costs for policies that 
they prefer, and existing oversight proce- 
dures may not adequately reduce such bias. 
One indirect way of countering this bias, 
albeit crude, might be to raise the discount 
rate that is required for projects to pass a 
benefit-cost test. My point is not to advocate 
for this policy but to note that a political 
economy framing of the regulatory problem 
could lead to different policy prescriptions. 

Last, the regulatory oversight policy 
framework for the past several decades has 
divided federal regulations into two cat- 
egories: those that get serious scrutiny and 
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those that do not (based on the threshold 
proposed to be raised from $100 million 
to $200 million). An alternative framing, 
and one used by the UK and the European 
Union, is to suggest that analysis should be 
proportional to the size and importance of 
the issue. This “principle of proportionality” 
offers a different way of approaching regu- 
latory oversight that is more finely tuned 
than the current approach. It implicitly rec- 
ognizes that a cutoff is arbitrary and that 
several different levels of effort may be ap- 
propriate for different types of regulations. 
This could include no analysis, a “back-of- 
the-envelope” benefit-cost analysis, or a 
more formal analysis that is reviewed by 
the Office of Information and Regulatory 
Affairs within the OMB. 


“There is an important role 
that academics can 
play in shaping and evaluating 
these reforms." 


In addition to rethinking policy frame- 
works, academics can help in several areas 
related to policy implementation. There is a 
growing literature that evaluates how ben- 
efit-cost analysis for regulations has been 
implemented by government agencies. The 
research shows that agencies do not always 
implement such analyses in accord with 
OMB directives. Part of the problem may lie 
with resources. This is important because 
the equity analysis that OMB is requesting 
under the Biden proposals will require ad- 
ditional resources. 

Even if agencies do have adequate re- 
sources, there is a question of how to make 
the best use of them to influence the regula- 
tory policy process. Some critics of the use 
of benefit-cost analysis have argued, for ex- 
ample, that such analysis is often done too 
late or simply used to justify agency or ad- 
ministration decisions after the fact. If true, 
this would suggest that early consultations 
between the regulatory agency and OMB 
that rely on a preliminary benefit-cost anal- 
ysis could lead to better policy outcomes. 
Such consultations could help ensure that 
benefit-cost analysis plays a more promi- 
nent role in actual decision-making. 

Academics can help evaluate such chal- 
lenges with implementation and make sug- 
gestions for improvement. Consider three 
examples. One relates to reproducibility. 
Many of the results for benefit-cost analysis 
of federal regulations are not easily repro- 
duced. If there were interest, the govern- 
ment could provide resources along with 
guidance about how this could be done, in- 


cluding how to address concerns about data 
privacy and confidentiality. This could lead 
to a more transparent process for public 
policy-making that holds decision-makers 
more accountable (/4). A second example 
relates to helping with equity analysis 
(15). Decision-makers currently have very 
limited reliable information on how low- 
income consumers and high-income con- 
sumers respond to different regulations, 
such as electric vehicle subsidies, and thus 
the benefits of interventions by subgroup. 
They also have very limited information on 
how the cost of regulation is apportioned 
among different groups. Last, scholars do 
not have a good understanding of the ac- 
tual impact of regulatory oversight on real- 
world policy outcomes. There is a working 
presumption that regulatory oversight 
makes a difference, but exactly how and 
why are not well understood. 

It remains to be seen how the suite of 
proposed reforms will be implemented. 
Likely policy outcomes in the short term 
include a greater focus on equity and more 
regulations aimed at addressing climate 
change. If the discount rate changes endure, 
they could favor policy interventions with 
benefits over longer time horizons, such as 
those affecting climate change. There is an 
important role that academics can play in 
shaping and evaluating these reforms. 
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Summer reading 2023 


Indigenous narratives inform an ecologist’s ode to the octopus. An 
underappreciated form of communication takes center stage in a 
linguist’s life’s work. A much-maligned party drug gains respect as a 
therapeutic agent. From a fictional glimpse into the lives of hysteria 
patients in 19th-century Paris, to a fascinating history of Califor- 
nia’s dwindling redwoods, to a soul-searching account of a voyage to 
Antarctica, the books on this year’s summer reading list invite careful 
reflection on topics ranging from physics to codebreaking. Read on 
for reviews written by alumni of the AAAS Mass Media Science 

& Engineering Fellows program of nine books with strong science 
themes set to publish this summer. —Valerie Thompson 


In a Flight of Starlings 


Reviewed by Robert Frederick! 


In his latest book, In a Flight of Starlings, 
Nobel Prize-winning physicist Giorgio 
Parisi sets himself a task that he admits is 
possible but not easy: to convey both sci- 
entific results and, more importantly, how 
scientists create them. His overall goal is to 
highlight how science and society are inter- 
twined, a coproduction, shaping and being 
shaped by one another. 
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Historically, this task has not always 
been so challenging. For centuries, read- 
ers of the Royal Society’s Philosophical 
Transactions, the world’s longest-running 
scientific journal, were encouraged to rep- 
licate experiments for themselves, thereby 
enacting the publisher’s motto, Nullius in 
verba (“Take nobody’s word for it”). As sci- 
ence became more specialized and experi- 


ments became more complicated, however, 


the need for a new system emerged. In the 
1830s, the Royal Society introduced the 
peer-review process. 


Unfortunately, Parisi writes, this pro- 
cess has led some scientists and science 
communicators to overemphasize re- 
sults, avoiding the more difficult task of 
explaining the underlying evidence and 
analysis, while also failing to stress sci- 
ence’s inherent uncertainty. As a result, 
when new evidence contradicts previously 
peer-reviewed findings, public trust in the 
scientific enterprise wanes. That distrust, 
in turn, can lead to science denialism and 
disastrous consequences. 

The book’s opening chapter on Parisi’s ex- 
perience studying the collective behavior of 
airborne flocks of starlings is an accessible 
tale of trial and error, scientific and techno- 
logical advances, surprise and delight. Why 
a theoretical physicist spent decades study- 
ing these birds is answered by the next few 
chapters: Parisi did not specialize, despite 
receiving advice to do so from fellow CERN 
physicist Martinus “Tini” Veltman, who was 
doing his own Nobel Prize-winning work at 
the time. Parisi argues that it was because 
he studied many things simultaneously that 
he made connections among different fields 
that led to new discoveries. 

The book’s subsequent chapters require 
considerably more patience. In chapter 5, 
for example, readers might struggle to un- 
derstand the mathematical modification 
that Parisi used to develop his theory about 
spin glasses, the work for which he won a 
Nobel Prize in Physics in 2021. 

Parisi writes that his Nobel Prize- 
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winning discovery happened by accident: 
He was researching a mathematical tool 
that he planned to apply to an unrelated 
problem, encountered a conceptual error 
that led the tool to produce incoherent re- 
sults, reworked the math himself, and dis- 
covered the spin glass equations. Indeed, 
several of Parisi’s memorable anecdotes 
bring to mind Louis Pasteur’s famous 
quote about how chance favors the pre- 
pared mind. 

With humility, Parisi also shares 
stories of how his preparation and in- 
tuition have sometimes failed him. He 
devotes an entire chapter, for example, 
to recounting how a series of missteps 
in 1973 led him to shelve an idea rather 
than spend “a moment’s thought” pur- 
suing alternative hypotheses. A few 
months later, three other scientists had 
that same thought and coauthored a 
paper that would go on to win them a 
Nobel Prize in 2004. 

Although Parisi’s stated goal is to ad- 
dress a wide audience, this book speaks 
directly to fellow scientists and to anyone 
who communicates science. We must com- 
municate both results and methods, Parisi 
maintains, all while sharing science’s 
“beauty, importance, and cultural value,” 
lest we share in the responsibility for en- 
couraging science denialism. 


In a Flight of Starlings: The Wonders of Complex 
Systems, Giorgio Parisi, Penguin Press, 2023, 144 pp. 
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| Feel Love 


Reviewed by Elie Dolgin? 


On a sunny September morning in 1975, 
two university students, Carl Resnikoff 
and Judith Gips, boarded a ferry in San 
Francisco Bay. Each swallowed a small 
capsule filled with a crystalline white 
powder. Their afternoon was soon filled 
with laughter, waves of euphoria, and a 
profound sense of compassion for each 
other and all of humanity. The pair were 
the first people identified by name to have 
taken 3,4-methylenedioxymethamphet- 
amine—a drug better known as MDMA, 
molly, ecstasy, or simply “E.” 

Millions of others have since “rolled” on 
MDMA. The drug initially gained popularity 
among practitioners of psychedelic-assisted 
psychotherapy, before being discovered by 
young partygoers in the nightclub and rave 
scenes. Governments around the world then 
cracked down on the compound, leading to 
the rise of an underground drug trade fu- 
eled by a chemistry whiz who synthesized 
kilograms of near-pure MDMA out of a 
converted laboratory in southern Brazil. The 
stash was sold by a priest turned MDMA 
kingpin who, before serving a 7-year sen- 
tence for drug dealing, donated thousands 
of pills (which were then sold for cash) to 
help fund animal toxicity studies intended to 
demonstrate MDMAs safety. 

Science journalist Rachel Nuwer recounts 


all this and more in her new book, I Feel 
Love, which details the complex and fasci- 
nating saga of how MDMA, a once obscure 
chemical, went on to become a beloved 
party drug, a controversial therapy tool, and 
a powerful symbol of the human desire for 
connection. While the stories and figures 
she describes may not share the same level 
of public recognition as those surrounding 
“Bicycle Day’—the anniversary of chemist 
Albert Hofmann’s first intentional LSD trip— 
they are no less captivating. And as regula- 
tory approval nears—pharmaceutical-grade 
MDMA will soon be available in Australia as 
a treatment for posttraumatic stress disor- 
der, with authorizations in other countries 
expected soon—the need for an improved 
understanding and public awareness of the 
drug’s potential effects on the brain has 
never been more urgent. 

Much of the terrain that Nuwer treads 
was previously explored in Michael Pollan’s 
2018 bestseller, How to Change Your Mind, 
which delved into the science, culture, and 
history of “classical” psychedelics such as 
LSD and psilocybin. That influential treatise 
helped raise mainstream consciousness 
about the therapeutic potential of these 
substances and marked a turning point in 
the rise of today’s psychedelic renaissance. 
But as Nuwer writes, “MDMA has its own 
distinct history and compelling cast of char- 
acters, its own unique neurological mecha- 
nisms and potential for both ill and good.” 

I Feel Love thus serves as something 
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of an unofficial sequel to Pollan’s literary 
landmark, filling in details about a thera- 
peutically promising drug left out of that 
earlier narrative. Scientifically, it picks up 
where Pollan left off, highlighting years of 
additional research into how psychedelics 
rewire the brain so as to create a renewed 
state of childlike openness and suggest- 
ibility, while also underscoring clinical data 
demonstrating the safety and efficacy of 
MDMA as a treatment for everything from 
alcoholism to social anxiety disorder. 

As MDMA stands poised to become a cor- 
nerstone of mental health treatment, readers 
must ask themselves: Are they ready to roll? 


| Feel Love: MDMA and the Quest for Connection in 
a Fractured World, Rachel Nuwer, Bloomsbury, 2023, 
384 pp. 


Many Things Under 
a Rock 


Reviewed by Dan Blustein® 


What has seven arms, can shape-shift to 
match a tuft of algae, and neutralizes a live 
clam by drilling a hole in its shell and inject- 
ing paralyzing saliva? If you guessed a male 
octopus that just lost an arm to a cannibalis- 
tic female after a failed mating attempt, you 
would be correct! 

Many Things Under a Rock, by ecologist 
David Scheel, includes numerous such dra- 
matic and captivating octopus factoids, but 
it also presents an accessible and nuanced 
exploration of the lives of these intriguing 
invertebrates. The book’s careful scientific 
observation, contextualized with modern 
and historical accounts of the species from 
Western and Native peoples, is an engaging 
read and a refreshing break from the seem- 
ingly steady stream of “sharktopus” thrill- 
ers with which we have been presented in 
recent years. 

Scheel, a cephalopod researcher, quickly 
garners the reader’s trust with his meticu- 
lous description of octopuses in the lab and 
in the wild. We follow along as Scheel’s per- 
spective shifts from an initial fear of these 
mysterious creatures, driven by legends of 
gigantic octopuses wrestling with divers, 
to one of nuanced respect. He details his 
transformation into an octopus expert as 
the book progresses, recounting expeditions 
to the frigid depths of the northern Pacific 
and to southern Australian clam beds as he 
searched for clues about how different octo- 
pus species interact with their environment, 
predators, prey, and each other. 

Exploring the octopus requires a multi- 
disciplinary approach, to which Scheel 
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The octopus is featured in Indigenous stories and cultural items, such as this textile crafted by a Kuna Indian artist. 


commits as he weaves together Western 
evolutionary history, behavior, ecology, and 
neuroscience with Indigenous ways of know- 
ing. He illustrates the connections between 
octopus biology and Native Alaskan octopus 
histories, for example, by revealing how vari- 
ations in Indigenous language seem to re- 
flect octopus natural history. The root “am-” 
in the Inuit word for octopus, “amikuk,” 
means “skin”, he reveals—a seeming refer- 
ence to the key evolutionary adaptation dif- 
ferentiating octopus from more-ancient mol- 
lusks: the loss of an external shell and the 
emergence of skin that enables swimming. 

Warming waters have driven transient 
octopus population booms in Japan and 
England throughout the past 150 years, a 
phenomenon that is also reflected in vari- 
ous Indigenous histories. Scheel connects 
these histories by discussing how commu- 
nities have made sense of local changes in 
octopus abundance. 

A few of the book’s descriptions of oc- 
topus actions and anatomy may be too 
detailed for nonexperts, but such instances 
are infrequent and further reinforce Scheel’s 
precise attention to detail in recounting his 
field observations. Scheel also references a 
range of scientific studies throughout the 
text, including some very recent work, al- 
though readers must rely on a notes section 
at the end rather than in-text citations to 
learn more about this research. 

The word for “octopus” in Eyak—a lan- 
guage native to Southcentral Alaska—is 
“tse-le:x-guh,” which translates literally as 
“many things under a rock.” The book’s title 
is thus a descriptor, not only of an octopus’s 
eight arms sheltered by a protective rock but 
also the many mysteries left to unravel about 
these extraordinary creatures. 


Many Things Under a Rock: The Mysteries of 
Octopuses, David Scheel, Norton, 2023, 320 pp. 


The Hidden History of 
Code-Breaking 


Reviewed by Francisco J. Guerrero* 


Chock-full of code puzzles for readers to 
solve, Sinclair McKay’s The Hidden History 
of Code-Breaking is an interactive explora- 
tion of the seemingly never-ending arms 
race between codemakers and codebreakers. 
The book’s strengths include its focus on the 
motivations behind code creation and the in- 
dividuals who created some of history’s most 
well-known codes. McKay writes, for exam- 
ple, about the serendipitous moment Samuel 
Morse first conceived of the dots, dashes, and 
spaces that would become Morse code while 
on board a transatlantic ship. Like vessels 
crossing the ocean, Morse imagined words 
going on a similar journey, carried by short 
electrical impulses along very long wires. 
McKay also highlights the stories of individu- 
als who used their intelligence, persistence, 
and creativity to crack codes that stumped 
others for years. This latter group includes 
Alan Turing, whose Bletchley Park team 
eventually cracked the Enigma code used by 
the Nazis during World War II. 

McKay’s writing is clear and engaging, 
making complex concepts and theories 
accessible to readers without oversimplify- 
ing them. In chapter 12, for example, he 
balances technical details and storytelling 
masterfully while exploring the Human 
Genome Project. Here, complex ideas such 
as gene sequencing are woven skillfully 
into tales about solving the mystery of what 
makes us human. 

The book touches on various fields, 
including linguistics, math, history, ar- 
chaeology, literature, biology, and politics, 
and demonstrates how codebreaking has 
influenced these fields throughout history, 
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offering a rich and insightful perspective 
for readers interested in the intersection 

of these fields. In chapter 5, for example, 
McKay explores the role of human relation- 
ships in the evolution of codes and ciphers, 
noting how secret lovers have long en- 
coded messages in poetry, songs, and other 
romantic expressions to arrange specific 
dates and encounters. 

Shortcomings include the fact that the 
puzzles presented for readers to solve 
throughout the book are of varying and in- 
consistent difficulty, a few contain inherent 
language and cultural biases that may not 
be accessible to all readers, and the instruc- 
tions to solve them are not always clear. 
Furthermore, the book’s focus on historical 
military and government codes may not ap- 
peal to all readers, especially those inclined 
toward modern cryptography applications. 
The book could also have benefited from 
more detailed explanations or images of 
some codes and artifacts. For example, 
McKay’s description of the Phaistos disc 
would have been clearer if accompanied by a 
pictorial diagram. 

Despite these weaknesses, The Hidden 
History of Code-Breaking is a worthwhile in- 
troduction to the world of codes and ciphers 
that offers a glimpse into the fascinating 
realm of encryption and how codes have 
been used throughout history. Its puzzles 
and historical trivia would make for interest- 
ing summer travel companions. 


The Hidden History of Code-Breaking: The Secret 
World of Cyphers, Uncrackable Codes, and Elusive 
Encryptions, Sinclair McKay, Pegasus, 2023, 400 pp. 


Thinking with Your 
Hands 


Reviewed by Lisa Aziz-Zadeh® 


How does gesture influence thinking? What 
does it reveal about language and conceptual 
learning? How can we use it to teach, parent, 
and heal? These are among the many ques- 
tions Susan Goldin-Meadow discusses in 

her thorough and powerful book, Thinking 
with Your Hands. The volume synthesizes 
the author’s 50+ years of expertise in gesture 
research, for which, among other awards 
and accolades, she won election into the 
National Academy of Sciences in 2020 and 
was the recipient of the prestigious David E. 
Rumelhart Prize in 2021. 

In the book’s first chapter, Goldin- 
Meadow discusses how gesture can reveal 
whether a child is ready to learn a new con- 
cept. She and her colleagues found, for ex- 
ample, that children will sometimes provide 
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a wrong answer to a mathematical question 
while simultaneously revealing through 
gesture that they understand the underlying 
concept, a mismatch that can be used as a 
sign that the child is on the cusp of learning 
the idea being taught. But educators must be 
attuned to gesture to maximize this learning 
stage—not recognizing the gesture-speech 
mismatch is a loss of a teachable moment, 
she argues. 

Meanwhile, in chapter 4, Goldin-Meadow 
reveals that children who are not taught 
language—for example, deaf children whose 
hearing parents do not use sign language— 
will often develop language spontaneously. 
By studying the so-called “homesign” ges- 
tures made by such children, researchers 
have determined that humans can develop 
certain language features on their own— 
for example, creating gesture sentences 
that are hierarchically structured. Other 
aspects of language, however, such as the 
use of the passive voice, need to be learned. 
Interestingly, mathematical reasoning ap- 
pears to require more person-to-person 
teaching than does language; apparently 
not all abstract concepts are similar in their 
amenability to self-invention. 

After discussing how gesture can be 
used to improve teaching, parenting, and 
rehabilitation, Goldin-Meadow questions 
whether mediums that do not reveal ges- 
ture—auditory recordings, for example, or 
live or recorded videos that restrict gesture 
space—should be admissible in judicial hear- 
ings. Laboratory studies indicate that an in- 
terviewer’s gestures might introduce biases 
in a witness’s memory and, simultaneously, 
that a witness’s gestures may reveal more 
information than their speech alone. 

It is at this point that the true power 
of this book emerges: Having convinced 


Gestures can convey information that speech cannot, 
making them critical features of communication. 


the reader to accept the importance and 
impact of gesture, Goldwin-Meadow urges 
us to question its broader societal implica- 
tions. That she might bring a lay audience 
this far in their appreciation of the vital 
but often overlooked impact of gesture on 
interpersonal communication is a triumph 
of this book. 


Thinking with Your Hands: The Surprising Science 
Behind How Gestures Shape Our Thoughts, 
Susan Goldin-Meadow, Basic Books, 2023, 272 pp. 


The Madwomen 
of Paris 


Reviewed by Stephani Sutherland® 


The Madwomen of Paris by Jennifer Cody 
Epstein tells the fictional story of Laure, a 
young woman living in Paris in the late 19th 
century who has been orphaned, separated 
from her sister, and, like so many real women 
of that time, institutionalized with hysteria at 
the Salpétriére asylum. There, Laure recov- 
ers and—with few other choices as a pen- 
niless young woman—stays on to work as a 
nursemaid to other hysterical patients. Her 
determination to be reunited with her sister 
is complicated by the arrival of a mysterious 
new patient, Josephine, who has clearly un- 
dergone a serious trauma but has no memory 
of what has happened to her. 

Josephine becomes a “star patient” of the 
asylum’s head doctor, Jean-Martin Charcot, 
who hypnotizes his hysterical patients to 
study the disease, often in front of a packed 
public audience. Under hypnosis, patients are 
subjected to all sorts of humiliations, includ- 
ing sexual assault. Those who behave badly 
receive “hydrotherapy,” which consists of be- 
ing sprayed with a strong hose, or are thrown 
into the “softs” (padded cells). The worst fate, 
it seems, is to be assigned to the “lunacy” 
wing of the hospital, where patients’ sanity 
is believed to be beyond repair. Josephine is 
placed under Laure’s care, and together they 
survive the horrific conditions of the asylum 
and unravel the mystery of Josephine’s past. 

Readers are informed up front that 
“Though inspired by a real place (the 
Salpétriére asylum, now a teaching hospi- 
tal in Paris), real events, and real historical 
figures, The Madwomen of Paris is a work 
of fiction.” But a key character of the book, 
Charcot, was a real person, sometimes re- 
ferred to as the “father of neurology.’ In ad- 
dition to studying hysteria, the real Charcot 
connected pathophysiology with the symp- 
toms of neurological diseases, allowing him 
to make the first diagnoses of multiple sclero- 
sis and amyotrophic lateral sclerosis, among 
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other diseases. Josephine, too, is based in 
part on a real “star” patient—Augustine 
Gleizes—the author explains in a note. 

The book’s blend of fact and fiction may 
leave readers with questions about which 
parts of the story reflect reality and which do 
not. For example, were the conditions truly 
as terrible for institutionalized women at that 
time as the book describes? Cody Epstein 
notes that she has “done what historical nov- 
elists can happily (if perhaps uncomfortably 
for some) do, shifting and omitting some 
events and chronologies, and entirely invent- 
ing others.” 

Other unanswered questions may inspire 
further reading. Hysteria is no longer rec- 
ognized as a medical disorder, for example, 
so what really afflicted the patients in the 
asylum? 

In any case, Cody Epstein has achieved 
her goal of immersing readers in the 
“stranger-than-fiction universe” of late-19th- 
century Paris. At a time when women’s repro- 
ductive rights are under threat and people 
with unexplained medical conditions are 
routinely gaslit, The Madwomen of Paris pro- 
vides a fascinating look back at a condition 
with modern-day resonance. 


The Madwomen of Paris: A Novel, Jennifer Cody 
Epstein, Ballantine Books, 2023, 336 pp. 


Life on Other Planets 


Reviewed by Clare Fieseler’ 


Aomawa Shields is many things: daughter 
of musicians, boarding school star, trained 
actor, sometimes astronomer, and a person 
with a deeply curious mind. Her 2015 TED 
talk “How we'll find life on other planets” 
propelled her to internet fame and inspired 
the title of her new memoir, Life on Other 
Planets. But Shields should not be defined 
by her alien-seeking research. As she puts it: 
“T am a champion of interdisciplinarity.” 
Shields’s powerfully personal book tells 
the story of a Black woman with two pas- 
sions finding her place in the world. After 
a failed first start as an astrophysics PhD 
student, Shields pursued professional act- 
ing for a decade. She eventually returned to 
the stars—and graduate school—restarting 
a career as an astrobiologist and, later, pro- 
fessor. Her journey is filled with self-doubt 
and serious soul-searching, but here’s what 


is clear: Modern science is still built for the 
easily defined worker. As a scientist-turned- 
journalist myself, I said “yes” out loud 
multiple times as I read how hard it was 
for Shields to be an interdisciplinarian and 
career-pivoter. 

In Life on Other Planets, Shields makes a 
stand against a kind of unbridled scientific 
success that comes at any cost. Hers is a 
story of applying to the US astronaut pro- 
gram three times, rejected each time and 
wiser for it. The time away from her daugh- 
ter would not have been worth it, she real- 
ized. She normalizes a view of the scientific 
career that is always changing. Goals will— 
and should—evolve as we learn more about 
ourselves and the objects we study. 

In one moving scene, Shields describes 
her elation at being invited to lunch by 
Ann Druyan, Carl Sagan’s widow and 
writing partner, where Shields receives a 
warm embrace and mutual respect from 
“the most important person” in Sagan’s 
life. Shields and Sagan are the closest 
of kindred spirits, sharing a propensity 
for polymathy and a passion for science 
communication. As a Black woman, the 
obstacles Shields encountered ascending 
to a Sagan-like professional position feel 

unfair. She describes her experiences in 
touching detail, but she does not inter- 
rogate the external societal structures 
that continue to serve as obstacles for 
women and people of color, not to mention 
polymaths. 

What, if anything, should change so that 
the life of an astronomer-slash-actor is not 
so arduous? The book should be a warning 


Aomawa Shields stands in front of the Arecibo 
Observatory in Puerto Rico in 1996. 


to the scientific community that, still today, 
pays lip service to STEAM (science, technol- 
ogy, engineering, arts, and mathematics) 
while leaving behind the people who do that 
work: scientific artists and artistic scientists. 
Imagining the reader as someone com- 
pelled by the cosmos, like herself, Shields 
gets to the root of it: “What do I want for 
you? I want you to look up and be amazed. 
I want you to feel supported, less lonely and 
afraid, a part of rather than apart from.” We 
may or may not be alone in the Universe, but 
Shields makes a case for togetherness, with 
each other and within ourselves. 


Life on Other Planets: A Memoir of Finding My Place 
in the Universe, Aomawa Shields, Viking, 2023, 352 pp. 


The Ghost Forest 


Reviewed by Bridget Alex® 


Evolving some 200 million years ago, red- 
wood trees survived the rupturing of Pangea, 
the meteor that offed the dinosaurs, and 
innumerable natural catastrophes and cli- 
mate swings. Individual trees have attained 
heights of more than 350 feet, trunks with 
30-feet diameters, and 3000th birthdays. 

In the mid-19th century, these primordial 
giants flourished in a 2-million-acre forest 
that stretched along California’s coast from 
the Bay Area to the Oregon border. Today, 
just 4% of this land harbors coast (or Cali- 
fornia) redwoods. The trees’ evolutionary 
cousin, the giant sequoia, ekes by in scat- 
tered groves of the Sierra Nevada. The Ghost 
Forest explores how and why the world’s 
tallest trees were logged nearly to extinction 
in less than two centuries. 

Greg King is an authoritative guide for 
this journey, highlights of which include 
extractive capitalism, specious regulations, 
and shady dealings. A journalist and envi- 
ronmental activist, he also enters the plot 
as a protagonist who led campaigns in the : 
1980s and 1990s to save declining old-growth 
forests. The book interweaves King’s experi- 
ences from the front lines of eco-activism 
with his decades of archival research into the 
forces behind redwood decimation. 

The history unfolds in three eras. During 
the late 1800s, private companies illegally 
acquired lands inhabited by coast redwoods. 
After World War I, loggers liquidated these 
forests and sold the prized timber to make 
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water and oil pipes, railway ties, shingles, 
telephone poles, and other infrastructure 

for the growing country. Near the end of the 
20th century, corporations profited once 
more from the swindled land when they sold 
redwood stands back to the government. 

The book triumphs as a comprehensive 
accounting of events and entities that ush- 
ered in this irreplaceable loss. Synthesizing 
decades of sleuthing, King reveals un- 
expected culprits such as the Save the 
Redwoods League—an organization cre- 
ated by business titans wanting to protect 
scenic redwood stands so the public would 
be placated but logging could continue out 
of sight. He also tactfully grapples with vile 
redwood protectors: Eugenicists and Nazi 
supporters considered the trees to be “apex 
species” worthy of life. 

For casual readers, some portions will 
drag as King names historical individuals 
and companies that figure only briefly and 
situates tree groves within California wa- 
tershed geography. The text also becomes 
oversaturated with superlatives for red- 
wood size, forest acres, and timber planks 
and payouts—but how else does one de- 
scribe astronomical profits made from fell- 
ing vast swaths of the world’s tallest trees? 

Patient readers will be rewarded because 
the pace quickens to that of a page-turner 
when King recounts tales of harrowing ac- 
tivism in the 1980s and 1990s. Suffering ar- 
rests, death threats, and FBI infiltrators, he 
and colleagues staged heroic protests, once 
even scaling the Golden Gate Bridge. 

Although set in the past, the book is 
urgently of-the-moment. Early perpetra- 
tors of what is now called “greenwashing,” 
corporate leaders hatched the Save the 
Redwoods League at Bohemian Grove, the 
elite resort recently in headlines because of 
Supreme Court Justice Clarence Thomas’s 
visits, for example. More broadly, the sys- 
tems of power and wealth that targeted the 
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redwoods and their protectors continue to 
inflict violence on Earth’s defenders world- 
wide. If the trends of global warming and 
deforestation hold steady, The Ghost Forest 
may eventually read as a prequel to the 
ghost planet. 


The Ghost Forest: Racists, Radicals, and Real Estate 
in the California Redwoods, Greg King, PublicAffairs, 
2023, 480 pp. 


The Quickening 


Reviewed by Elizabeth Case® 


In The Quickening, Elizabeth Rush takes 
readers to the precipice of the climate 
crisis. Aboard the Nathaniel B. Palmer, an 
American icebreaker, Rush and a crew of 
scientists, journalists, and support staff set 
bow and stern in front of Thwaites Glacier 
for the first time in history, sampling water 
in unnamed bays; collecting sediments, 
shells, and bones; and sending submarines 
under the glacier to photograph evidence 
of past rates of glacial retreat. 

The Quickening is framed as a play in 
four acts. The cast consists of the scientists, 
crew, two other journalists, and the glacier 
(although, interestingly, not Rush herself). 
Interspersed between Rush’s monologues, 
her shipmates tell stories about their 
births, the reasons they do the work they 
do, and the lessons they learn from it. The 
glacier, of course, never speaks directly. 
Instead, it calves, groans, and creaks—com- 
muniques left open to our interpretation. 

Rush’s descriptions of the ice and ocean 
transport readers to the ship’s bridge. 

The first iceberg she sees is “dove gray,” 
“whipped meringue,” “a milky-aquamarine 
spire,” and “the pearly luster of kyanite.” 
Later, between floes, “where the nearly fro- 
zen water shows,” she recalls “a turquoise 
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Logging and disingenuous conservation efforts have substantially reduced populations of California redwoods (Sequoia sempervirens). 


so deep it torques all it touches into some- 
thing new.” 

The Quickening is an intensely personal 
story—a memoir that is also an act of pro- 
cessing as Rush works through her decision 
to have a child as the climate crisis looms. 
The book’s title references the sensation a 
mother experiences when a baby first moves 
in utero—a nod not just to Rush’s own ef- 
forts to become pregnant but also to the mo- 
ment scientists noticed that Thwaites had 
begun to respond to climate change. 

As a scientist, I have also traveled to 
Thwaites. I am planning to go again this 
year. And because, as Rush points out, no 
pregnant people are allowed to work in or 
around Antarctica, each year I go is another 
year my partner and I must wait to start a 
family, another year of uncertainty about 
whether the desire for a child is selfish, bio- 
logical, logical, or loving. The Quickening 
helped me orient these questions, although, 
of course, it could not answer them. 

I did wonder about the sense of agency 
Rush grants to Thwaites and whether it 
undermines humankind’s responsibility for 
the planet’s future. At one point, she asks, 
“Will Miami even exist in one hundred 
years?” and answers, “Thwaites will decide.” 
Rush is referring to how much sea level rise 
Thwaites will contribute as it melts. But it 
is us—and really a small subset of us—who 
will decide. We may already have, if the gla- 
cier has already begun unstably collapsing. 

This response aside, The Quickening is a 
poignant, necessary addition to the body of 
Antarctic literature, one that centers—with- 
out glorifying—motherhood, uncertainty, 
community, vulnerability, and beauty in a 
rapidly melting world. 


The Quickening: Creation and Community at the 
Ends of the Earth, F/izabeth Rush, Milkweed Editions, 
2023, 424 pp. 
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Improve energy-efficient 
construction in China 


In the past few decades, China’s construc- 
tion industry has undergone a swift expan- 
sion, increasing its energy use and carbon 
emissions (7). In an effort to meet climate 
goals (2), China has formulated guidelines to 
minimize carbon emissions in new develop- 
ment (3). However, the construction indus- 
try lacks effective regulatory standards and 
assessments to ensure energy efficiency. 

When inspecting the carbon impact 
of construction, the Chinese government 
primarily refers to the Green Building 
Evaluation Standard and the Technical 
Guidelines for Energy Efficiency Evaluation 
and Labeling of Civil Buildings, which 
comprehensively assess variables such as 
resource management, land planning, and 
building features (such as air conditioning 
units, light and sound impacts, and green 
spaces) (4, 5). However, these standards rely 
on outdated baselines. For instance, the 2022 
Beijing Winter Olympics have been hailed 
as the first Olympics in history to achieve 
carbon neutrality (6), but the construction 
of the Olympic venues adhered to 2014 regu- 
lations for green buildings (7). Since 2014, 
carbon mitigation strategies have improved, 
and lower emission options could have been 
used for energy systems, construction mate- 
rials, and operational strategies (8, 9). 

The current standards also lack a frame- 
work for monitoring and appraising the 
carbon footprint throughout the entire life- 
cycle of buildings. The impact of new infra- 
structure includes not only design and con- 
struction but also the emissions produced 
while it is in use—in some cases spanning 
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decades—and its demolition and disposal 
(10). Many projects in China now claim to 
qualify as “low-carbon construction” (7) (a 
more efficient classification than “green con- 
struction”) by citing energy-saving technolo- 
gies used during limited stages, such as con- 
struction or operation. To accurately classify 
projects as low carbon emitters, China needs 
a comprehensive evaluation system. 

To maximize the use of low-carbon 
construction equipment, materials, and 
methods, the central government should 
standardize the design and construction 
of energy-efficient buildings. Local govern- 
ments and relevant agencies should rigor- 
ously review project applications, strengthen 
construction quality monitoring, and 
increase the frequency of spot checks dur- 
ing the building operation phase. Finally, 
an evaluation system that determines the 
carbon emissions over the entire lifespan 
of new infrastructure, similar to the US 
Evaluation System for Zero Net Carbon 
Building Performance (12), should inform 
development decisions. 

Xinbo Xu and Zhiwei Lian* 
Department of Architecture, Shanghai Jiao Tong 


University, Shanghai 200240, China. 
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Researchers need better 
access to US Census data 


The US Census Bureau decided to adopt a 
differential privacy framework by adding 
noise to the 2020 Census data to improve 
the confidentiality of individual Census 
responses (J). To protect the collected data, 
random noise was added to the tabulated 
statistics, and the data and noise were 
stored together in a Noisy Measurement 
File (NMF). The NMF is critical for under- 
standing biases in the Census data and for 
performing valid statistical inferences, as 

it potentially allows data users to adjust 

for the noise in their analyses. However, in 
2021, the Census Bureau released only the 
final tabulated statistics that were produced 
after postprocessing the NMF (J). This post- 
processing ensured the final published data 
met data consistency requirements (such as 
nonnegative population counts), but it may 
have also introduced systematic biases (2-7). 
The Census Bureau must provide data users 
access to the NMF in usable form to facilitate 
the wide array of use cases for Census data. 

In April, after public requests (2, 8), the 
Census Bureau released a demonstration 
NMF based on the 2010 Census data (9). 

It plans to release the NMF for the 2020 
Census later this year (JO). Unfortunately, the 
current NMF release is difficult to process 
and is unlikely to be useful for most Census 
data users (11). 

To help users work with the NMF, the 
Census Bureau should host an unnested, 
labeled version on the Census website with 
Application Programming Interface access so 
researchers can more easily access and ana- 
lyze data. Centralized NMF documentation 
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should be made available that explains the 
high-level structure of the NMF and its 
relation to published decennial statistics 
and tabulation geographies. Aggregation 
specifications should link raw noisy mea- 
surements to traditional tabulation statistics 
to facilitate statistical analysis. An addi- 
tional version of the NMF for which these 
aggregations have already been performed 
should also be made available, so that each 
tabulation has a single estimate. Detailed 
geographic information for the NMF, includ- 
ing shapefiles and geography assignment 
files that describe the relationship between 
traditional tabulation geographies and NMF 
geographies, will also help analysts. 

Census data serve as the backbone for a 
substantial number of scientific analyses and 
policy decisions. Producing a more accessi- 
ble and useful NMF will benefit researchers 
and facilitate more accurate and applicable 
conclusions without compromising the con- 
fidentiality of individual Census responses. 
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Chile’s road plans 
threaten ancient forests 


During the United Nations Biodiversity 
Conference (COP15) in December 2022, 
nearly 200 countries, including Chile, 
agreed to halt biodiversity loss by 2030 
and to take urgent actions to stop the 
extinction of endangered species. Despite 
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this commitment, the Chilean govern- 
ment is pushing for the construction of a 
road that would cross the Alerce Costero 
National Park (J), an area of global impor- 
tance for biodiversity conservation (2) and 
home to the endangered conifer Fitzroya 
cupressoides (3). Throughout the world, 
roads threaten biodiversity and ecosystem 
functions (4). Before pushing this project 
ahead, Chile should consider the likelihood 
that the road will undermine the country’s 
progress toward international environmen- 
tal commitments. 

Fitzroya, which grows exclusively 
in Chile and Argentina, is one of the 
longest-living tree species on Earth (5, 

6). Fitzgroya forests are among the forests 
that sequester the most carbon world- 
wide, and they provide critical ecosystem 
services and a wealth of historical and 
environmental information (7). Fitzroya 
populations face a high risk of extinc- 
tion after centuries of overexploitation 
and burning (7) and, more recently, as a 
result of climate change (3). 

The Alerce Costero National Park is the 
only area that protects a genetically unique 
Fitzroya population and the last remnants 
of species-rich Valdivian temperate rainfor- 
ests from the Coastal range (8, 9). Building 
a road through this vulnerable ecosystem 
would increase the risk of invasion by alien 
species, facilitate illegal logging, and greatly 
increase the probability of extensive wild- 
fires in the park (4). More than 90% of wild- 
fires occur within 1 km of roads in Chile (0). 

Chile’s proposed road completely ignores 
the COP15 agreement. The government 
must honor its commitments and prioritize 
the protection of the country’s most endan- 
gered species. The global biodiversity crisis 
and the unprecedented high risk of species 
extinction (11) call for timely and concrete 
actions. The preservation of roadless areas 
is critical to the goals of reducing extinc- 
tion risks and protecting 30% of the planet. 
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TECHNICAL COMMENT ABSTRACTS 


Comment on “Policy impacts of statistical 
uncertainty and privacy” 


Yifan Cui et al. 

Steed et al. illustrate the crucial impact that 
the quality of official statistical data products 
may exert on policy decisions. We underscore 
the importance of conducting principled qual- 
ity assessment of official statistical data prod- 
ucts. We observe that the quality assessment 
procedure employed by Steed et a/. needs 
improvement, due to the inadmissibility of the 
estimator used and the inconsistent probabil- 
ity model it induces on the joint space of the 
estimator and the observed data. We propose 
alternative statistical methods to conduct prin- 
cipled quality assessments for official statisti- 
cal data products. 

Full text: dx.doi.org/10.1126/science.adf9724 


Response to Comment on “Policy impacts of 
statistical uncertainty and privacy” 


Ryan Steed et al. 

Cui et al. propose a valuable improvement to 
our method of estimating lost entitlements 
due to data error. Because we don't have 
access to the unknown, “true” number of 
children in poverty, our paper simulates data 
error by drawing counterfactual estimates 
from a normal distribution around the of- 
ficial, published poverty estimates, which we 
use to calculate lost entitlements relative 

to the official allocation of funds. But, if we 
make the more realistic assumption that the 
published estimates are themselves nor- 
mally distributed around the “true” number 
of children in poverty, Cui et als proposed 
framework allows us to reliably estimate lost 
entitlements relative to the unknown, ideal 
allocation of funds—what districts would 
have received if we knew the “true” number 
of children in poverty. 

Full text: dx.doi.org/10.1126/science.adh2297 


2 JUNE 2023 * VOL 380 ISSUE 6648 903 


Chec 
upd: 


C 


& 


PHOTO: COOPER ETAL. 


Edited by Michael Funk 


A cool path for making glass 
rinting glass with additive manu- 
facturing techniques could provide 
access to new materials and struc- 
tures for many applications. However, 
one key limitation to this is the high 

temperature usually required to cure glass. 

Bauer et al. used a hybrid organic-inorganic 

polymer resin as a feedstock material that 

requires a much lower temperature for 
curing (see the Perspective by Colombo 
and Franchin). The ability to form transpar- 
ent, fused silica at only 650°C opens up 
different uses for the material. The glass 
produced has excellent spatial resolution, 
optical quality, and mechanical 


properties. —BG 


Science, abq3037, this issue p. 960; 


see also adi2747, p.895 


A fused silica lattice created 
by 3D printing 


Self-healing, self-aligning 
polymers 

One advantage of using soft 
materials for robotic devices 

is that there is greater scope 
for self-healing, but a chal- 
lenge for multilayer devices 

is to ensure realignment after 
damage. Cooper et al. pres- 
ent a method for healing 
multilayered and functional 
polymer materials by showing 
how a combination of dynamic 
hydrogen-bonding interactions 
and phase separation between 
different polymeric building 
blocks can be leveraged to 
achieve simultaneous autono- 
mous realignment and healing 
of multilayered polymer films. 
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This approach can restore both 
the mechanical and functional 
properties of complex polymer 
composites and even enables 
underwater self-assembly. 
—MSL 

Science, adhO619, this issue p. 935 


ANTHROPOLOG) 
Ahand’s breadth 

For as long as humans have pro- 
duced things, there have been 
reasons to measure. Early mea- 
surements made use of what 
people had at hand—parts of 
their bodies—to create relatively 
standardized measurements. 
Kaaronen et al. looked at the 
development and use of body- 
based measurements across 
more than 180 cultures (see the 


Perspective by Chrisomalis) and 
found that such measures occur 
across cultures and commonly 
form the base of measurement 
systems. In many cultures, stan- 
dardized measurement systems 
have replaced body-based ones, 
but these do persist and are 
sometimes superior to standard 
measurement, especially when 
the goal is to build something 
for use by a specific person. 
—SNV 

Science, adf1936, this issue p. 948; 

see also adi2352, p. 894 


Step by step 
Developing highly selective 
aptamers, folded RNA or DNA 
oligonucleotides that bind to 
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ligands, is a challenge because 
the sequence space is difficult 
to explore efficiently, nucleo- 
tides have limited chemical 
functionality, and methods 
for predicting RNA structure 
are not very accurate. Yang 
et al. developed an approach 
for sequential optimization of 
aptamers by modifying the 
target molecule with various 
functional groups. To design 
a sensor for the amino acid 
leucine, they used multiple 
derivatives to isolate aptamers 
with high affinity and selectiv- 
ity. They then used a stepwise 
approach based on substruc- 
ture to generate an aptamer 
for the antifungal drug voricon- 
azole. —MAF 

Science, abn9859, this issue p. 942 
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BIOSENSING 
Miniaturized wireless 
tracking 


Minimally invasive medical 
procedures often require 
cameras or markers to track 
locations within the body. 
However, there are places 
that cables cannot reach, 
and there are challenges with 
imaging into deep tissues or 
trying to limit exposure to 
harmful radiation. Gleich et 
al. developed an innovative 
platform for magnetic track- 
ing and sensing applications 
using magneto-mechanical 
resonators (MMRs). In theory 
and experiments, the authors 
showed that MMRs can 
outperform existing technolo- 
gies such as radiofrequency 
markers in terms of sensitiv- 
ity. They also demonstrate 
sensing applications (position 
and orientation, pressure, and 
temperature) and provide 
examples of spatial tracking in 
three dimensions. —MSL 
Science, adf5451, this issue p. 966 


ULTRAFAST DYNAMICS 
The view from rhodium 


The capacity of metals such 
as rhodium and palladium 
to cleave carbon-hydrogen 
bonds facilitates numerous 
useful chemical reactions. 
Fundamental studies of the 
underlying dynamics at the 
metal center have often relied 
indirectly on shifts in the vibra- 
tional frequency of a spectator 
carbon monoxide ligand as the 
reaction with hydrocarbons 
ensues. Jay et al. used x-ray 
spectroscopy to study the 
ultrafast evolution of rho- 
dium’s electronic state directly 
as the metal bound and then 
broke a carbon—hydrogen 
bond in octane. —JSY 

Science, adf8042, this issue p. 955 


PHYSIOLOGY 
A neurotrophin to 
maintain liver mass 


In the healthy liver, a subset 
of hepatocytes proliferates 
to ensure a defined organ 
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size. Trinh et al. explored 
how hepatocyte proliferation 
is Supported by pericytes 
known as hepatic stellate 
cells (also see the Focus by 
Schoenberger and Tchorz). 
Hepatic stellate cell ablation in 
adult mice caused the liver to 
shrink over time because the 
hepatocytes stopped prolifer- 
ating, leading to gradual tissue 
loss. Hepatic stellate cells 
were a source of the growth 
factor neurotrophin-3, which 
drove hepatocyte proliferation 
in vitro and in hepatic stel- 
late cell-depleted mice by 
stimulating the receptor IrkB. 
—AMV 
Sci. Signal. (2023) 
10.1126/scisignal.adf6696, 
10.1126/scisignal.adh5460 


T CELLS 
T cells contribute 
to psoriasis 


Psoriasis is a chronic inflam- 
matory skin disorder that 
can be triggered by infection 
with group A Streptococcus 
(GAS). It is known that GAS- 
induced immune responses 
can promote psoriasis, but 
how T cells are involved in 
the pathogenesis is unclear. 
CDia is a cell surface protein 
that presents lipid antigens 
to T cells and is known to 
be linked to psoriasis. Chen 
et al. studied peripheral 
blood and skin samples from 
human participants, iden- 
tifying a CDla-restricted, 
GAS-responsive population of 
T cells with diverse func- 
tionalities. These cells were 
expanded in patients with pso- 
riasis and were also reactive 
to the self-antigen lysophos- 
phatidylcholine, which is 
increased in inflammatory 
conditions. Skin inflamma- 
tion was exacerbated after 
GAS infection in transgenic 
mice expressing human CDla. 
These findings demonstrate 
that clonal expansion of CDla- 
restricted T cells induced by 
GAS infection can drive auto- 
reactivity in psoriasis. —HMI 
Sci. Immunol. (2023) 
10.1126/sciimmunol.add9232 
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METABOLISM 
A search for 
cholesterol genes 


Despite substantial progress in 
understanding coronary artery 
disease, it remains one of the top 
causes of death, and cholesterol 
metabolism plays a major role in 
its progression. In some cases, 
genetic variants that alter the 
uptake of low-density lipoprotein 
(LDL) cholesterol, the “bad” type, 
have been characterized and even 
targeted with specific therapies, 
but these are only found in small 
numbers of patients. Hamilton 
et al. combined genome-scale 
CRISPR screening, mouse experi- 
ments, and analysis of human 
data from the UK Biobank to iden- 
tify the hundreds of genes and 
some pathways involved in LDL 
metabolism, helping to identify 
potential targets for future thera- 
peutic development. —YN 
Cell Genom. (2023) 
10.1016/j.xgen.2023.100304 


HOST DEFENSE 
CDC key to broad 
antibacterial immunity? 


CDA4T cells play an important role 
in immune defense against the 
bacterial pathogen Streptococcus 
pneumoniae (pneumococcus), 
but the antigens that they rec- 
ognize are not well understood. 
Ciacchi et al. report the existence 
of a highly immunogenic epitope 
derived from the pneumococcal 
cholesterol-dependent cytolysin 
(CDC) and the virulence factor 
pneumolysin (Ply). A polyclonal 
repertoire of aB CD4T cells from 
the majority of blood donors 
tested recognizes the Ply 55-414 
undecapeptide in the context 

of broadly expressed human 
leukocyte antigen allotypes. 
Moreover, Ply, 44,—-Specific 

CDA4T cells can also recognize 
CDCs from a wide range of other 
bacterial species, suggesting that 
this conserved epitope might be 
a productive target for vaccines 


science.org SCIENCE 


& 


C 


RESEARCH 


ALSO IN SCIENCE JOURNALS Pie 0/ ilichae eink 


HUMAN GENETICS 
Heart-brain connections 
and genetics 


It is known that cardiovascular 
disorders correlate with some 
neurological and psychiatric 
conditions, but it is not always 
clear what the connections are 
and whether they are caused by 
an innate predisposition or by the 
stress induced by having a medi- 
cal condition. To detangle these 
questions, Zhao et al. examined 
imaging and genetic data from 
tens of thousands of participants 
in the UK Biobank and BioBank 
Japan (see the Perspective by 
Sacher and Witte). Through this 
large-scale analysis, the authors 
uncovered correlations between 
structure and function of both 
the heart and the brain, such as 
links between specific features of 
cardiac imaging and neuropsychi- 
atric disorders. The authors also 
used Mendelian randomization 
to demonstrate shared genetic 
influences on both the brain and 
the heart. —YN 

Science, abn6598, this issue p. 934; 

see also adi2392, p. 897 


CIRCADIAN RHYTHMS 
Controlling clock-neuron 
coupling 

Coordination of physiology with 
daily rhythms is regulated by the 
neurons of the suprachiasmatic 
nucleus (SCN), the central pace- 
maker of the biological clock. 

Tu et al. describe a signaling 
mechanism at cilia in these neu- 
rons that keeps the individual 
cells of the SCN synchronized 
(see the Perspective by Kim 

and Blackshaw). The length 

and abundance of primary cilia 
in SCN neurons oscillated with 
daily light-dark cycles. Cilia orga- 
nize signaling by the morphogen 
Sonic Hedgehog (SHH), and 
regulation of the expression of 
this gene was required for syn- 
chrony of the SCN cells. In mice 
exposed to an altered light cycle 
to induce experimental jet lag, 
disrupting SHH signaling allowed 
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the animals to adjust more 
quickly to the altered environ- 
ment. —LBR 
Science, abm1962, this issue p. 972; 
see also adi3177, p.896 


THALASSEMIA 
Linking blood and bone in 
beta-thalassemia 


Why beta-thalassemia results in 
bone defects in some patients 
is unclear. Aprile et a/. now show 
that the elevated erythropoietin 
seen in beta-thalassemia results 
in increased fibroblast growth 
factor-23 (FGF23) in the bone 
and bone marrow, which can 
produce bone defects. A small 
peptide inhibiting FGF23 both 
restored the bone marrow hema- 
topoietic stem cell niche and 
rescued bone defects in a mouse 
model. This study thus ties the 
blood and bone together in beta- 
thalassemia and demonstrates 
an avenue to target their patho- 
logical interaction. —CAC 
Sci. Transl. Med. (2023) 
10.1126/scitransImed.abq3679 


CHEMISTRY 
Using electrons to 
replace metals 


Aryl-aryl cross-couplings are 
critical to the efficient synthesis 
of pharmaceuticals, electronic 
materials, and fine chemicals. 
These reactions are typically per- 
formed with metal catalysts that 
carry expensive ligands. Abe and 
Shirakawa found an exciting alter- 
native that uses light to generate 
electrons that act as catalysts for 
the coupling under mild condi- 
tions, thus bypassing the need for 
the metal catalysts. -MG 
Sci. Adv. (2023) 
10.1126/sciadv.adh3544 


PRIMATE GENOMES 
A global primate resource 


Primates are a widely dis- 
tributed and variable group. 
Characterizing primate evolution 
and variation not only allows 
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us to better understand and 
conserve species, many of which 
are highly threatened, but will 
also help us to better under- 
stand ourselves. Kuderna et al. 
present high-coverage genome- 
sequence data across all 16 
primate families and 86% of 
species. The authors used these 
data to create a new phylogeny, 
explore relationships between 
population size and mutation 
rate, measure levels of threat, 
and identify missense mutations 
as they relate to those found in 
our own species. —SNV 

Science, abn7829, this issue p. 906 


PRIMATE GENOMES 
Understanding primate 


evolution 


Although humans think of 
ourselves as unique among 
animals—and we are in many 
ways—we are fundamentally one 
species among hundreds of oth- 
ers in the primate lineage. Thus, 
as we learn about evolution and 
adaptation across this group, 
we also learn about ourselves 
at both basal and derived levels. 
Shao et al. looked across 50 
primate genomes in a compara- 
tive phylogenetic framework to 
resolve patterns of gene evolu- 
tion, selection, and adaptation. 
They identified thousands of 
genes under selection that 
contribute to the phenotypic 
shaping of this varied lineage. 
Many of the innovations that 
they identified occurred in the 
ancestral lineage, meaning that 
they are widely shared across 
the group. —SNV 

Science, abn6919, this issue p. 913 


PRIMATE GENOMES 
Primate histories 


The process of speciation 
among populations is not 
instantaneous. Within genomes 
of related species, some 
regions will not show evidence 
of difference for a long time 
after ecological speciation 

has occurred, a process called 


incomplete lineage sorting (ILS). 
By accounting for ILS across 
primates, Rivas-Gonzalez et al. 
were able to produce a primate 
phylogeny that agrees with 
fossil estimates (unlike past 
attempts). Patterns of ILS allow 
for estimates of ancestral popu- 
lation sizes and the impacts of 
selection and disease resistance 
within this group. —SNV 

Science, abn4409, this issue p. 925 


PRIMATE GENOMES 
Two make one 


Hybridization can occur between . 
closely related species, but the 
offspring often have reduced 
fitness, suggesting that such 
interbreeding Is an evolutionary 
dead-end. Despite this general 
impression, hybridization does 
sometimes lead to viable, or 
even more fit, offspring. In such 
cases, reproductive prefer- 
ence or reinforcement can lead 
to the divergence of a hybrid 
lineage. Wu et a/. used genome 
sequences from a group of mon- 
key species in the Rhinopithecus 
genus and found clear evidence 
that the gray snub-nosed mon- 
key is derived from hybridization 
between the golden snub-nosed 
monkey and the ancestor of two 
extant Rhinopithecus species. 
The unusual coat color seen in 
the gray snub-nosed monkey is 
caused by this mixing. —SNV 
Science, abl4997, this issue p. 926 


PRIMATE GENOMES 


Shaped by the cold 


The evolution of sociality is 
perennially fascinating to 
humans as highly social crea- 
tures, but understanding how 
such complex behavior emerged 
is challenging. Qi et al. looked 
across the colobine group of 
monkeys, which show varying 
levels of sociality, using fossils, 
genomics, ecology, and bioge- 
ography to reveal the drivers 

of and mechanisms underlying 
social evolution. They found that 
adaptation to cold climates led 
to both behavioral and genetic 
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changes, which in some groups 
furthered social complexities 
such as prolonged maternal care 
and reduced male-male aggres- 
sion. —SNV 

Science, abl8621, this issue p. 927 


PRIMATE GENOMES 
A complex history 


It has been increasingly recog- 
nized that hybridization is not 
a rare anomaly that occurs at 
species range edges but can 
be integral to the process of 
speciation. One group of spe- 
cies that has been identified as 
having a history of hybridization 
is that containing the genus 
Papio, the baboons. Sg@rensen et 
al. used high-coverage whole- 
genome sequencing to reveal 
the evolutionary history of the 
six overlapping baboon species 
and found evidence of repeated 
admixture, including a popula- 
tion derived from three distinct 
lineages. Understanding such 
evolutionary complexity in 
baboons can shed light on how 
this process occurs more widely. 
—SNV 

Science, abn8153, this issue p. 928 


PRIMATE GENOMES 
Finding benign variants 
across species 


As genomic analysis of human 
patients has become more 
common, many different genetic 
variants have been found. 
Unfortunately, it is difficult to 
know which are directly associ- 
ated with disease, especially for 
rare variants. To help address 
this gap in knowledge, Gao et al. 
collected gene-sequencing data 
from hundreds of nonhuman 
primates across 233 different 
species. Using these primates’ 
genomes, the authors mapped 
out common gene variants that 
were preserved by natural selec- 
tion and were not pathogenic. 
On the basis of these data, the 
authors built a deep learning net- 
work that can be applied to help 
identify benign genetic variants 


SCIENCE science.org 


in human patients based on 

their similarity to those seen in 

nonhuman primates. —YN 
Science, abn&197 this issue p. 929 


PRIMATE GENOMES 
Primate genomes help 
with human genes 


Large-scale genetic studies of 
human participants typically 
identify numerous gene variants 
that correlate with various traits 
and diseases. Unfortunately, 
it is difficult to determine 
which of these are biologically 
relevant and which ones are only 
correlated because of their chro- 
mosomal location near relevant 
genes. To clarify which gene 
variants are truly pathogenic, 
Fiziev et al. combined data from 
multiple large human biobanks 
with information derived from 
233 nonhuman primate species. 
The authors delineated rare 
pathogenic variants with strong 
effects and more common ones 
with weaker effects and demon- 
strated that the former generally 
confer a greater risk of Severe or 
early-onset disease. —YN 
Science, abol1131, this issue p. 930 
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BIOSENSING 
Miniaturized wireless 
tracking 


Minimally invasive medical 
procedures often require 
cameras or markers to track 
locations within the body. 
However, there are places 
that cables cannot reach, 
and there are challenges with 
imaging into deep tissues or 
trying to limit exposure to 
harmful radiation. Gleich et 
al. developed an innovative 
platform for magnetic track- 
ing and sensing applications 
using magneto-mechanical 
resonators (MMRs). In theory 
and experiments, the authors 
showed that MMRs can 
outperform existing technolo- 
gies such as radiofrequency 
markers in terms of sensitiv- 
ity. They also demonstrate 
sensing applications (position 
and orientation, pressure, and 
temperature) and provide 
examples of spatial tracking in 
three dimensions. —MSL 
Science, adf5451, this issue p. 966 


ULTRAFAST DYNAMICS 
The view from rhodium 


The capacity of metals such 
as rhodium and palladium 
to cleave carbon-hydrogen 
bonds facilitates numerous 
useful chemical reactions. 
Fundamental studies of the 
underlying dynamics at the 
metal center have often relied 
indirectly on shifts in the vibra- 
tional frequency of a spectator 
carbon monoxide ligand as the 
reaction with hydrocarbons 
ensues. Jay et al. used x-ray 
spectroscopy to study the 
ultrafast evolution of rho- 
dium’s electronic state directly 
as the metal bound and then 
broke a carbon—hydrogen 
bond in octane. —JSY 

Science, adf8042, this issue p. 955 


PHYSIOLOGY 
A neurotrophin to 
maintain liver mass 


In the healthy liver, a subset 
of hepatocytes proliferates 
to ensure a defined organ 
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size. Trinh et al. explored 
how hepatocyte proliferation 
is Supported by pericytes 
known as hepatic stellate 
cells (also see the Focus by 
Schoenberger and Tchorz). 
Hepatic stellate cell ablation in 
adult mice caused the liver to 
shrink over time because the 
hepatocytes stopped prolifer- 
ating, leading to gradual tissue 
loss. Hepatic stellate cells 
were a source of the growth 
factor neurotrophin-3, which 
drove hepatocyte proliferation 
in vitro and in hepatic stel- 
late cell-depleted mice by 
stimulating the receptor IrkB. 
—AMV 
Sci. Signal. (2023) 
10.1126/scisignal.adf6696, 
10.1126/scisignal.adh5460 


T CELLS 
T cells contribute 
to psoriasis 


Psoriasis is a chronic inflam- 
matory skin disorder that 
can be triggered by infection 
with group A Streptococcus 
(GAS). It is known that GAS- 
induced immune responses 
can promote psoriasis, but 
how T cells are involved in 
the pathogenesis is unclear. 
CDia is a cell surface protein 
that presents lipid antigens 
to T cells and is known to 
be linked to psoriasis. Chen 
et al. studied peripheral 
blood and skin samples from 
human participants, iden- 
tifying a CDla-restricted, 
GAS-responsive population of 
T cells with diverse func- 
tionalities. These cells were 
expanded in patients with pso- 
riasis and were also reactive 
to the self-antigen lysophos- 
phatidylcholine, which is 
increased in inflammatory 
conditions. Skin inflamma- 
tion was exacerbated after 
GAS infection in transgenic 
mice expressing human CDla. 
These findings demonstrate 
that clonal expansion of CDla- 
restricted T cells induced by 
GAS infection can drive auto- 
reactivity in psoriasis. —HMI 
Sci. Immunol. (2023) 
10.1126/sciimmunol.add9232 


2 JUNE 2023 « VOL 380 ISSUE 6648 


IN OTHER JOURNALS 


o 
v 


e 
Edited by Caroline Ash 
and Jesse Smith 
- =] ~ +. 


METABOLISM 
A search for 
cholesterol genes 


Despite substantial progress in 
understanding coronary artery 
disease, it remains one of the top 
causes of death, and cholesterol 
metabolism plays a major role in 
its progression. In some cases, 
genetic variants that alter the 
uptake of low-density lipoprotein 
(LDL) cholesterol, the “bad” type, 
have been characterized and even 
targeted with specific therapies, 
but these are only found in small 
numbers of patients. Hamilton 
et al. combined genome-scale 
CRISPR screening, mouse experi- 
ments, and analysis of human 
data from the UK Biobank to iden- 
tify the hundreds of genes and 
some pathways involved in LDL 
metabolism, helping to identify 
potential targets for future thera- 
peutic development. —YN 
Cell Genom. (2023) 
10.1016/j.xgen.2023.100304 


HOST DEFENSE 
CDC key to broad 
antibacterial immunity? 


CDA4T cells play an important role 
in immune defense against the 
bacterial pathogen Streptococcus 
pneumoniae (pneumococcus), 
but the antigens that they rec- 
ognize are not well understood. 
Ciacchi et al. report the existence 
of a highly immunogenic epitope 
derived from the pneumococcal 
cholesterol-dependent cytolysin 
(CDC) and the virulence factor 
pneumolysin (Ply). A polyclonal 
repertoire of aB CD4T cells from 
the majority of blood donors 
tested recognizes the Ply 55-414 
undecapeptide in the context 

of broadly expressed human 
leukocyte antigen allotypes. 
Moreover, Ply, 44,—-Specific 

CDA4T cells can also recognize 
CDCs from a wide range of other 
bacterial species, suggesting that 
this conserved epitope might be 
a productive target for vaccines 
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and immunotherapies against 
many different bacterial diseases. 
—SIS 


Immunity (2023) 
10.1016/j.immuni.2023.03.020 


VASCULAR DEVELOPMENT 
Imaging vascular 
remodeling of skin 


An organized vascular system is 
imperative for proper organ devel- 
opment and function. The mouse 
skin provides a unique platform 
with which to visualize vascular 
network maturation and adult 
homeostatic states at single-cell 
resolution. Kam et al. sought to 
understand the principles of vas- 
cular network maturation using 
intravital imaging to spatiotem- 
porally track and manipulate skin 
blood vessels and endothelial 
cells (ECs) in vivo. In neonates, 
vessel regression drives capillary 
network expansion through EC 
migration. In adults, ECs become 
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positionally stable, and the 
coordination of their collective 
rearrangements maintains vessel 
integrity. Adult, but not neonatal, 
ECs preferentially survive damage 
through a self-repair mechanism 
of damaged plasma membranes 
that prevents vessel regression. 
Thus, adult ECs prioritize self- 
repair and vessel maintenance 
more than those in the neonatal 
vasculature, in which vessels are 
expendable. —SMH 

Cell (2023) 10.1016/j.cell.2023.04.017 


SCIENTIFIC WORKFORCE 
Parity in physics starts 
in high school 


Over the past two decades, the 
number of undergraduate phys- 
ics degrees awarded to women 
in the United States has held 
steady at just 20%. Research 
shows that the decision to study 
physics is influenced by cultural 
associations and complex social 


BIOLOGICAL INVASION 


High costs of 
invasive species 


subset of the myriad species 
that people have introduced into 
new regions become “invasive,” 
with outsized effects on ecosys- 
tems and human health, food 
production, and livelihoods. However, 
the costs associated with biological 
invasions may be underrecognized 
because their impacts often take a 
long time to appear and accumulate. 
Turbelin et al. compared the costs 

of damage associated with invasive 
species against those of natural 
hazards, including storms, drought, 
fires, floods, and earthquakes, both 
globally and within the United States. 
At both scales, storms incurred the 
highest costs, but the total costs of 
invasive species were similar to or 
greater than those of other types of 
natural hazards—and are increasing 
over time. —BEL 


Perspect. Ecol. Conserv. (2023) 
10.1016/}j.pecon.2023.03.002 


Invasive species such as water hyacinth, 
pictured here, have high economic costs. 


dynamics about “who does 
physics.” Potvin et al. examined 
the effect of physics lessons with 
counternarratives, discourses 
that provide perspectives of those 
who have been marginalized, on 
high school students’ future phys- 
ics career intentions. Their results 
showed that female students and 
students from minoritized racial 
or ethnic groups who had been 
exposed to the counternarratives 
were more likely to think that they 
had a possible future in physics, 
demonstrating that high school 
classrooms can be an effective 
place to have equity discussions 
surrounding science participa- 
tion. —MMc 
Phys. Rev. Phys. Educ. Res. (2023) 
10.1103/PhysRevPhysEducRes. 
19.010126 


2D MATERIALS 
A magnet with a twist 


The properties of layered two- 
dimensional magnets depend 
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strongly on the number of layers 
and the way they are stacked on 
top of each other. For instance, 
the material Crl,, which orders 
ferromagnetically in monolayer 
form, can have ferromagnetic 
or antiferromagnetic interlayer 
coupling depending on whether 
the stacking is rhombohedral or 
monoclinic. Natural bilayers have 
monoclinic stacking, resulting in 
antiferromagnetic coupling and 
no overall magnetization. Xie et 
al. studied what happens when 
two such bilayers are twisted with 
respect to each other by a small 
angle. Using magneto-optical 
measurements, the research- 
ers observed a nonzero overall 
magnetization and signatures of 
noncollinear spins that peaked at 
the twist angle of 1.1°. Future stud- 
ies will be needed to visualize the 
spin texture directly. —JSt 
Nat. Phys. (2023) 
10.1038/s41567-023-02061-z 


QUANTUM COMPUTING 
A quantum route to 


solving graphs 


Graph theory is a branch of 
mathematics in which practical 
problems such as optimization, 
networking, and operational 
research can be mapped 
geometrically. The time and 
resources required to find 
solutions to such problems 
can grow exponentially with 
the problem size and quickly 
become unsolvable on clas- 
sical computers. Deng et al. 
demonstrate the application of 
the intermediate-sized optical 
quantum computer “Jiuzhang” 
to solve two difficult graph the- 
ory problems: random search 
and simulated annealing. Their 
boson sampling strategy looks 
at the expectation probabilities 
of anumber of single photons 
making their way through a 
scattering matrix formed of 
complex photonic circuits 
representing the graphs. With 
increasing system size, sucha 
quantum approach is likely to 
present a computational advan- 
tage over classical algorithms. 
—ISO 
Phys. Rev. Lett. (2023) 
10.1103/PhysRevLett.130.190601 
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INTRODUCTION: There is increasing evidence 
pointing to a close relationship between heart 
health and brain health, with cardiovascular 
diseases potentially leading to brain diseases 
such as stroke, dementia, and cognitive im- 
pairment. Magnetic resonance imaging (MRI) 
is a valuable tool that can be used to assess 
both the heart and brain, generating biomark- 
ers and endophenotypes for various clinical 
outcomes. However, although recent large- 
scale analyses have been conducted on heart 
and brain MRI-derived traits separately, few 
studies have explored the potential for multi- 
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organ MRI to examine heart-brain connections 
and identify shared genetic effects. The struc- 
tural and functional links between the heart 
and the brain remain unclear. 


RATIONALE: Using multiorgan MRI and genetic 
data from >40,000 subjects, we aimed to quan- 
tify interorgan connections between the heart 
and brain and identify the underlying genetic 
variants. Specifically, we analyzed 82 cardiac 
and aortic MRI-derived traits across six cate- 
gories: left and right ventricles, left and right 
atria, and ascending and descending aortas, as 


Ascending aorta 
minimum area 


Stroke 


Schizophrenia 


Heart-brain connections revealed by multiorgan imaging genetics. Top left: Quantifying the heart and 
brain structure and function in MRI. Top right: Examples of associations between heart MRI traits and brain 
white matter tracts. Bottom left: Genomic loci associated with heart MRI traits that overlapped with traits 
and disorders of the heart and/or brain. Bottom right: Selected genetic correlations between heart MRI traits 
and brain disorders. 
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well as 458 brain MRI traits that measi Che 
; upd 
structure and function. 


RESULTS: After controlling for various cova- 
riates, we found that heart MRI traits were 
clearly associated with the brain across all im- 
aging modalities studied. We observed multi- 
ple patterns of association for brain gray matter 
morphometry, white matter microstructure, 
and functional networks. For example, we found 
that the left ventricle of the heart showed the 
strongest correlations with microstructure met- 
rics of cerebral white matter tracts, suggesting 
that adverse heart features were associated 
with poorer white matter microstructure. 

Our genome-wide association analysis of heart 
MRI traits identified 80 associated genomic 
loci (P < 6.09 x 107’°). We performed sex-specific 
analysis and found that the genetic effects 
on heart structure and function were highly 
consistent between both sexes. Further, we 
conducted a systematic search of previously 
reported genetic results in these genomic loci 
and found that heart MRI traits had shared 
genetic influences and colocalized with heart 
and brain diseases and complex traits. 

We identified genetic correlations between 
heart MRI traits and various brain complex traits 
and diseases such as stroke, eating disorders, 
schizophrenia, cognitive function, and mental 
health traits. For example, adverse myocardial 
wall thickness condition was positively genet- 
ically correlated with stroke. We further used 
two-sample Mendelian randomization to ex- 
plore causal genetic links between the heart 
and brain, and our findings suggest that ad- 
verse heart features have genetic causal effects 
on several brain diseases such as psychiatric 
disorders and depression. 


CONCLUSION: This study deepened our under- 
standing of heart-brain links and their genetic 
basis. We observed that MRI measurements of 
the two organs were associated with each other, 
and this was independent of a wide variety of 
body measures, shared risk factors, and imag- 
ing confounders. We also uncovered genetic 
colocalizations and correlations between heart 
structure and function and brain clinical end 
points, suggesting that adverse heart metrics 
may have implications for brain abnormalities 
and the risk of brain diseases. By understand- 
ing human health from a multiorgan perspec- 
tive, we may be able to improve disease risk 
prediction and prevention and mitigate the 
negative effects of one organ disease on other 
organs that may be at risk. 


The list of author affiliations is available in the full article online. 
*Corresponding author. Email: htzhu@email.unc.edu 
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insights from magnetic resonance images 
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Cardiovascular health interacts with cognitive and mental health in complex ways, yet little is known 
about the phenotypic and genetic links of heart-brain systems. We quantified heart-brain connections 
using multiorgan magnetic resonance imaging (MRI) data from more than 40,000 subjects. Heart 

MRI traits displayed numerous association patterns with brain gray matter morphometry, white matter 


microstructure, and functional networks. We identified 80 associated genomic loci (P < 6.09 x 10 


aes | 


for heart MRI traits, which shared genetic influences with cardiovascular and brain diseases. 

Genetic correlations were observed between heart MRI traits and brain-related traits and disorders. 
Mendelian randomization suggests that heart conditions may causally contribute to brain disorders. Our 
results advance a multiorgan perspective on human health by revealing heart-brain connections and 


shared genetic influences. 


growing amount of evidence suggests 

close interplays between heart health 

and brain health (fig. S1). Cardiovascular 

diseases may provide a pathophysiolog- 

ical background for several brain diseases, 
including stroke (7), dementia (2), cerebral small 
vessel disease (3), and cognitive impairment 
(4, 5). For example, atrial fibrillation has been 
linked to an increased incidence of demen- 
tia (6) and silent cerebral damage (7) even in 
stroke-free cohorts (8). It has been consistently 
observed that heart failure is associated with 
cognitive impairment and eventually demen- 
tia (9), likely because of the reduced cerebral 
perfusion caused by the failing heart (70). Con- 
versely, mental disorders and negative psycho- 
logical factors may contribute substantially to 
the initiation and progression of cardiovascu- 
lar diseases (17-13). Patients with mental ill- 
nesses such as schizophrenia, bipolar disorder, 
epilepsy, or depression show an increased in- 
cidence of cardiovascular diseases (14-17). Acute 
mental stress may cause a higher risk of athero- 
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sclerosis because of stress-induced vascular in- 
flammation and leukocyte migration (J8). 
Primarily because of the lack of data, almost 
all prior studies on heart-brain interactions and 
associated risk factors (19-25) have focused on 
one (or a few) specific diseases or used small 
samples. Therefore, the overall picture of the 
structural and functional links between the 
heart and the brain remains unclear. 

In heart and brain diseases, magnetic reso- 
nance imaging (MRI)-derived traits are well- 
established endophenotypes. Cardiovascular 
magnetic resonance imaging (CMR) has been 
widely used to assess cardiac structure and 
function, yielding insights into the risk and 
pathological status of cardiovascular diseases 
(26-28). Brain MRI modalities provide de- 
tailed information about brain structure and 
function (29). Clinical applications of brain 
MRI have revealed the associated brain abnor- 
malities that accompany multiple neurological 
and neuropsychiatric disorders (30-32). More- 
over, twin and family studies have shown that 
CMR and brain MRI traits are moderately to 
highly heritable (33-35). For example, the left 
ventricular mass (LVM) has a heritability es- 
timate >0.8 (34). Most brain structural MRI 
traits are highly heritable (heritability ranges 
from 0.6 to 0.8) (36), and the heritability of 
brain functional connectivity is usually be- 
tween 0.2 and 0.6 (37). A few recent genome- 
wide association studies (GWASs) have been 
separately conducted on CMR (38-43) and 
brain MRI traits (44-57). For example, several 
large-scale efforts have been made to discover 
genetic variants associated with brain struc- 
tures; examples include ENIGMA (37), Neuro- 
CHARGE (52), and IMAGEN (53). Although 
MRI has been widely used in clinical research 


and genetic mapping, few studies have used 
multiorgan MRI to examine heart-brain con- 
nections and identify the shared genetic signa- 
tures of the heart and the brain. 

In the present study, we investigated heart- 
brain connections using multiorgan imaging 
data obtained from >40,000 subjects in the 
UK Biobank (UKB) study (54). By using a re- 
cently developed heart segmentation and fea- 
ture extraction pipeline (55-57), we generated 
82 CMR traits from the raw short-axis, long- 
axis, and aortic cine images. These CMR traits 
included global measures of four cardiac cham- 
bers, the left ventricle (LV), right ventricle (RV), 
left atrium (LA), and right atrium (RA), and 
two aortic sections, the ascending aorta (AAo) 
and the descending aorta (DAo), as well as re- 
gional (58) phenotypes of the LV myocardial 
wall thickness and strain [table S1 and sup- 
plementary text (59)]. Then, we identified the 
relationships between the 82 CMR traits and a 
wide variety of the brain MRI traits discovered 
from multimodality images (60), including struc- 
tural MRI (164 traits), diffusion MRI (110 traits), 
resting functional MRI (resting fMRI) (92 global 
traits and >60,000 regional traits), and task 
fMRI (92 global traits and >60,000 regional 
traits). These brain MRI traits provided fine 
details of brain structural morphometry (44, 67) 
(regional brain volumes and cortical thickness 
traits), brain structural connectivity (47, 62) 
[diffusion tensor imaging (DTI) invariant mea- 
sures of white matter tracts], and brain in- 
trinsic and extrinsic functional organizations 
(49, 63, 64) (functional activity and connectivity 
at rest and during a task) (table S2). To eval- 
uate the genetic determinates underlying heart- 
brain connections, we performed GWASs for the 
82 CMR traits to uncover the genetic architec- 
ture of the heart and aorta. Compared with 
existing GWASs of CMR traits (38-43), our study 
used a much broader group of cardiac and aortic 
traits, allowing us to identify the shared genetic 
components with a wide variety of brain-related 
complex traits and disorders. For example, (42) 
mainly focused on nine measures of the right 
heart, (38) analyzed six LV traits, and (43) studied 
three traits of diastolic function. Figure 1 pro- 
vides an overview of the study design and analy- 
ses. The GWAS results of 82 CMR traits can be 
explored and are freely available through the 
heart imaging genetics knowledge portal (Heart- 
KP) at http://heartkp.org/. 


Phenotypic heart-brain connections 


To verify that the 82 CMR traits are well de- 
fined and biologically meaningful, we first ex- 
amined their reproducibility using the repeat 
scans obtained from the UKB repeat imag- 
ing visit (2 = 2903; average time between visits, 
2 years). For each trait, we calculated the intra- 
class correlation (ICC) between two observations 
from all revisited individuals. The average ICC 
was 0.653 (range = 0.369 to 0.970; table S1). 
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Fig. 1. Overview of the study design and analyses. (A) Overview of the study. We used CMR and brain MRI traits as endophenotypes to explore the phenotypic and 
genetic connections between the heart and the brain. (B) Description of the overall workflow and the key analyses involved in each step. 


Some volumetric traits had very high ICC 
(>0.9), including the LV end-diastolic vol- 
ume (LVEDV), LVM, RV end-diastolic volume 
(RVEDV), RV end-systolic volume (RVESV), 
AAo maximum area, AAo minimum area, DAo 
maximum area, DAo minimum area, and global 
myocardial wall thickness. The ejection frac- 
tion [such as the LV ejection fraction (LVEF) ] 
and distensibility traits (e.g., the DAo disten- 
sibility) had the lowest ICC among all volumetric 
traits (mean = 0.574 and 0.519, respectively). 
In addition, the average ICC was 0.760 for the 
17 wall thickness traits, 0.532 for the seven 
longitudinal peak strains, 0.569 for the 17 cir- 
cumferential strains, and 0.516 for the 17 radial 
strains. Additionally, we examined the changes 
in 82 CMR traits over a 2-year period and were 
able to replicate the direction of most of the 
aging effects (per 7.5 years) described in (55) 
[table S1 and supplementary text (59) ]. Over- 
all, these results suggest that the extracted CMR 
traits have moderate to high within-subject 
reliability and can consistently delineate the 
cardiac and aortic structure and function. 

We examined the associations between CMR 
traits and brain MRI traits in UKB individuals 
of white British ancestry (n = 31,152; see the 
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materials and methods for a list of adjusted 
covariates). At the Bonferroni significance lev- 
el (P < 1.33 x 10°), CMR traits were associated 
with a wide variety of brain MRI traits, includ- 
ing regional brain volumes, cortical thickness, 
DTI parameters, and resting and task fMRI 
traits (Fig. 2A, fig. S2, and table S3). Among 
the 4193 Bonferroni-significant associations in 
our discovery sample, 1574 were significant at 
the nominal level (0.05) in a holdout indepen- 
dent validation dataset (n = 5316) with con- 
cordant association signs (figs. S3 to S5). For 
example, global wall thickness was positively 
associated with the volumes of multiple sub- 
cortical brain structures (fig. S2B). Particu- 
larly, both left and right putamen volumes 
were associated with at least 10 wall thickness 
traits (fig. S4). Subcortical regions across both 
brain hemispheres showed consistent asso- 
ciation patterns, potentially highlighting the 
robustness of these correlations. Additional 
examples of replicated associations can be 
found in the supplementary text (59). 

CMR traits were also correlated with brain 
structural and functional connectivity. For ex- 
ample, fractional anisotropy (FA) and mean 
diffusivity (MD) are two robust measures of 


brain structural connectivity and white mat- 
ter microstructure, with higher FA and lower 
MD values typically signifying better white 
matter integrity (65). The FA values of several 
white matter tracts consistently showed neg- 
ative associations with aortic areas (e.g., AAo 
and DAo minimum areas), LV traits (e.g., LVM, 
LVEDV, and wall thickness traits), and LA min- 
imum volume (LAV,pin). Moreover, these CMR 
traits exhibited consistent positive associations 
with MD values (Fig. 2, B and C, and fig. S6). 
For resting fMRI, both mean functional con- 
nectivity and mean amplitude (i.e., functional 
activity) traits were negatively associated with 
volumetric measures of the four cardiac cham- 
bers, such as the LV cardiac output (LVCO), RV 
ejection fraction (RVEF), LA stroke volume 
(LASV), and RA ejection fraction (RAEF) (fig. 
S7). By contrast, positive correlations were wide- 
ly observed for wall thickness traits, longitu- 
dinal strains, and peak circumferential strains. 
The task {MRI traits showed similar patterns 
(fig. S8). To further discover fine-grained de- 
tails of CMR connections with brain functions, 
we examined pairwise associations between 82 
CMR traits and 64,620 high-resolution func- 
tional connectivity traits (49) in resting fMRI. 
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Fig. 2. Phenotypic heart-brain Ae 
associations. (A) The -logl0 

(P value) of phenotypic correlations Zs 
between 82 CMR traits and five . 
groups of brain MRI traits, including 

101 regional brain volumes, 63 cortical ® 
thickness traits, 110 DT! parameters, oO 
92 resting fMRI traits, and 92 task 3 ¢ 
fMRI traits. The dashed line indicates = 
the Bonferroni significance level (P < m 8 
133 x 10°), Each CMR trait category 

is labeled with a different color. m 
(B) Significant correlations (P < 1.33 x * 
10°°) between fractional anisotropy 7 


values of white matter tracts and 
AAo minimum area. (C) Significant 
correlations (P < 1.33 x 10°) between ° 
mean diffusivity values of white matter 
tracts and global myocardial wall 
thickness at end diastole (global wall 
thickness). 
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Bonferroni-significant associations (P < 7.15 x 
10°) were observed across the functional con- 
nectivity of the whole brain, with specific pat- 
terns emerging across different functional areas 
and networks (fig. S9, A and B). For example, 
the somatomotor network and its connectivity 
with the secondary visual network were asso- 
ciated with multiple CMR traits. Specifically, 
positive somatomotor associations were ob- 
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served in the LVM, RVESV, RA minimum volume 
(RAV nin)) global peak circumferential strain, 
and global wall thickness (figs. S9C and S10 to 
S13), and negative correlations were observed in 
all four ejection fraction traits [RVEF, LA ejection 
fraction (LAEF), RAEF, and LVEF] and LVCO 
(figs. S9D and S14 to S17). Additional examples 
can be found in the supplementary text (59) 
(figs. S18 to S28). Furthermore, we performed the 
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above phenotypic association analyses separate- 
ly for males and females (figs. S29 to S32), used 
canonical correlation analysis (CCA) (66) to inves- 
tigate the multivariate associations between CMR 
traits and various groups of brain MRI traits, 
and examined the influence of environmental 
factors and biomarkers on the underlying mech- 
anisms of heart-brain interactions [figs. S33 to 
$35, table S4, and supplementary text (59)]. 
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Fig. 3. Genetics of CMR traits in the UKB. (A) SNP heritability of 82 CMR traits across the six categories. The x axis displays the short names of CMR traits; see table S1 


for the full names of these traits. The average heritability of each category is labeled. (B) Ideogram of 80 genomic regions associated with CMR traits (P < 6.09 x 10° 


a 


Red and brown name labels denote genomic regions that have been replicated in the validation dataset after applying Bonferroni correction and at a nominal level, respectively. 
(C) LVESV was associated with the 22q11.23 region in both the UKB (index variant rs5760061) and BBJ (index variant rs5/60054) studies. (D) LVESV was associated 
with the 8q24.13 region in both the UKB and BBJ studies (shared index variant rs34866937). 


Heritability and the associated genetic loci 

of 82 CMR traits 

We estimated the single-nucleotide polymor- 
phism (SNP) heritability for the 82 CMR traits 
using UKB individuals of white British ances- 
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try (67) (n = 31,875). The mean heritability (7) 
was 22.9% for the 82 traits (range = 7.07 to 
70.2%; Fig. 3A), all of which remained signif- 
icant after adjusting for multiple testing using 
the Benjamini-Hochberg procedure to control 
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the false discovery rate (FDR) at the 0.05 level 
(P < 1.09 x 10~%) (table S5). The h” of the AAo/ 
DAo maximum areas and AAo/DAo minimum 
areas was >50%. Among cardiac traits, the global 
wall thickness, RVESV, RVEDV, LV end-systolic 
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volume (LVESV), LVEDV, and LVM had the 
highest heritability (h7 > 37.8%). A sex-specific 
heritability analysis was conducted separately 
for females and males, and the heritability es- 
timates for both sexes were similar (mean h? = 
24.8 versus 22.6%, correlation = 0.910, P = 0.332; 
fig. S36). 

We next performed GWASs for the 82 CMR 
traits using this white British cohort (7 = 31,875). 
All Manhattan and QQ plots can be browsed 
through the server on Heart-KP. The intercepts 
of linkage disequilibrium (LD) score regression 
(LDSC) (68) were all close to one, suggesting no 
genomic inflation of test statistics caused by 
confounding factors (mean intercept = 0.99986; 
range = 0.982 to 1.019). At the significance level 
6.09 x 10°*° (5 x 10°°/82, that is, the stan- 
dard GWAS significance threshold, addition- 
ally Bonferroni adjusted for the 82 traits), we 
identified independent (LD 7” < 0.1) signifi- 
cant associations in 80 genomic regions (cyto- 
genetic bands) for 49 CMR traits, including 
35 for LV, 35 for AAo, 14 for DAo, 11 for RV, 
and 1 for LA (Fig. 3B and table S6). Detailed 
interpretations of these identified regions can 
be found below. These genetic effects on CMR 
traits were highly consistent in the sex-specific 
GWASs, in which males and females were an- 
alyzed separately (correlation = 0.944; P = 0.739; 
fig. S37). In the supplementary text (59), we 
further demonstrate that these CMR traits 
exhibited a highly polygenic genetic architec- 
ture and shared heritability with brain MRI 
traits, particularly with DTI parameters mea- 
suring white matter microstructure (figs. S38 
and S39 and table $7). 

To replicate the identified loci, we performed 
separate GWASs using holdout datasets in the 
UKB study that were independent from our 
discovery dataset. First, we repeated GWASs 
on a European dataset with 8252 subjects (see 
the materials and methods). For the 243 inde- 
pendent (LD 7° < 0.1) CMR-variant associa- 
tions in the 80 genomic regions, 56 (23.04%, 
in 25 regions) passed the Bonferroni signifi- 
cance level (2.06 x 10 *, 0.05/243) in this Eu- 
ropean validation GWAS, and 178 (73.25%, in 
61 regions) passed the nominal significance 
level (0.05) (Fig. 3B and table S8). All 178 asso- 
ciations had concordant directions in the two 
independent GWASs, and the correlation of 
their genetic effects was 0.963 (fig. S40). These 
results show a high degree of generalizability 
of our GWAS findings among European co- 
horts. We also performed GWAS on two non- 
European UKB validation datasets: the UKB 
Asian (UKBA, n = 500) and UKB Black (UKBBL, 
nm = 271). One association between 8q24.3 and 
the RVEF passed the Bonferroni significance 
level (P = 8.281 x 10°”) in UKBA, and 14 more 
regions passed the nominal significance level. 
For UKBBL, 12 regions passed the nominal 
significance level, and none of them survived 
the Bonferroni significance level, which may 
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be partially caused by the small sample size of 
this non-European GWAS. Additionally, we eval- 
uated the ancestry-specific effects using Asian 
GWAS summary statistics of three CMR traits 
[analogous to the LVEDV, LVESV, and LVEF 
(41)], which were generated from 19,000 sub- 
jects in the BioBank Japan (BBJ) study (69). At 
the stringent GWAS 1.666 x 107° (5 x 107°/3) 
threshold, BBJ CMR traits identified indepen- 
dent (LD 7” < 0.1) significant associations in 
22q11.23, 8q24.13, and 10q22.2. Of the three 
regions, 22q11.23 and 8q24.13 were among the 
80 regions that were discovered in the UKB 
white British cohort. These two regions were 
significantly associated with the LVSEV in 
both the UKB and the BBJ studies (Fig. 3, C 
and D). The 10q22.2 had a small P value in the 
UKB GWAS (P = 1.58 x 10°), but did not sur- 
vive the 6.09 x 10° threshold. 

Finally, we constructed polygenic risk scores 
(PRSs) using lassosum (70) to evaluate the out- 
of-sample prediction power of the discovery 
GWAS results (see the materials and methods). 
Among the 82 CMR traits, 75 had significant 
PRS at the FDR 5% level (Prange = 4.47 x 10°? 
to 3.74 x 10°*; table S9). The highest incremen- 
tal R* value (after adjusting for the effects of 
covariates) was observed on the AAo mini- 
mum area and the AAo maximum area (7.20 
and 7.04%, respectively). To evaluate the cross- 
population performance, PRS was also con- 
structed on UKB white British discovery GWAS 
data using BBJ GWAS summary statistics of 
the LVEDV, LVESV, and LVEF. We found that 
the PRSs of these three traits were all signif- 
icant in the UKB (P range = 1.58 x 10" to 8.13 x 
10’; R’ range = 3.90 x 10“ to 1.35 x 107°). The 
prediction accuracy was lower than that in 
the above within European prediction anal- 
ysis (R range = 7.72 x 10°? to 9.67 x 10°”), which 
may be explained by the smaller training GWAS 
sample size in the BBJ study and population 
differences between the UKB and BBJ cohorts. 


Pleiotropy of genetic variants across 
body systems 


To identify the shared genetic effects between 
CMR traits and complex traits, we performed 
association lookups for independent (LD 7” < 
0.1) significant variants (and variants in their 
LD, 7° = 0.6, P < 6.09 x 10°"°) detected in our 
UKB white British GWAS. In the National Hu- 
man Genome Research Institute-European 
Bioinformatics Institute (NHGRI-EBI) GWAS 
catalog (71), our results tagged variants that 
have been linked to a wide range of traits and 
diseases, including heart diseases, heart struc- 
ture and function, blood pressure, lipid traits, 
blood traits, diabetes, stroke, neurological and 
neuropsychiatric disorders, psychological traits, 
cognitive traits, lung function, parental lon- 
gevity, smoking, and drinking. To evaluate 
whether two associated genetic signals were 
consistent with the shared causal variant, we 


applied the Bayesian colocalization analysis 
(72) for CMR traits and selected phenotypes 
with publicly available GWAS summary sta- 
tistics. Evidence of pairwise colocalization was 
defined as having a posterior probability of the 
shared causal variant hypothesis (PPH4) > 0.8 
(72, 73). Many shared genetic variants were 
found to be expression quantitative trait loci 
(eQTLs) in a recent large-scale eQTL meta- 
analysis of brain (74) and blood tissues (75). 
The traits with shared genetic effects are pre- 
sented in table S10, with selected pairs shown 
in Fig. 4 and figs. S41 to S108. Table S11 sum- 
marizes the results of colocalization and eQTL 
analyses. Below, we highlight genetic overlaps 
between CMR traits and complex traits and 
diseases of the heart and brain, as well as other 
clinical outcomes. 

First, we replicated 27 genomic regions that 
have been previously linked to cardiac and 
aortic traits, such as fractional shortening and 
LV internal dimension (fig. S41). There were 
21 regions associated with heart rate and elec- 
trocardiographic traits (e.g., QRS duration; 
figs. S42 to S46) and six regions with aortic 
measures (e.g., thoracic aortic aneurysms and 
dissections; figs. S47 and S48). In addition, 
30 regions had shared associations (LD r” = 
0.6) with cardiovascular diseases, including 
12 regions with coronary artery disease (76) 
(figs. S49 and S50), nine regions with atrial 
fibrillation (77) (Fig. 4A and figs. S51 to S55), 
and five regions with hypertension (78) (figs. 
S56 to S58). Other heart diseases included 
abdominal aortic aneurysm (79) (figs. S4’7 and 
S59), mitral valve prolapse (80) (fig. S46), and 
idiopathic dilated cardiomyopathy (87) (figs. 
S60 and S61). There was widespread evidence 
of colocalization on many loci (PPH4 > 0.899). 
Additionally, 41 of the 80 genomic regions 
were associated with blood pressure traits such 
as diastolic or systolic blood pressure, pulse 
pressure, and mean arterial pressure (Fig. 4B 
and figs. S62 to S81). CMR traits were in LD 
(r* = 0.6) with various cardiovascular and blood 
biochemistry biomarkers such as lipid traits 
(figs. S50, S56, S67, and S82), red blood cell 
count, blood protein levels, red cell distribution 
width, and plateletcrit (figs. S83 to S87). 

We found genetic pleiotropy between CMR 
traits and multiple brain-related complex traits 
and disorders. In the 6p21.2, 7p21.1, and 12q24.12 
regions, CMR traits were in LD (7? = 0.6) with 
stroke (82) (e.g., ischemic stroke, large artery 
stroke, and small-vessel ischemic stroke), in- 
tracranial aneurysm (83), and moyamoya dis- 
ease (84) (Fig. 4, A and B, and fig. S50). The 
index variants of 7p21.1 (rs2107595) and 12q24.12 
(rs597808) were eQTLs of TWIST7 and ALDH2 
in human brain tissues (74), suggesting that 
these CMR-associated variants were known to 
affect gene expression in human brain. TWIST1 
was associated with cerebral vasculature defects 
(85), and there was a higher level of ALDH2 
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Fig. 4. Selected genetic loci associated with both CMR trait and other 
complex traits and diseases. (A) In 6p21.2, we observed colocalization between 
the global myocardial wall thickness (WT) at end-diastole (WT global, index 
variant rs4151702) and atrial fibrillation (index variant rs3176326). The posterior 
probability of Bayesian colocalization analysis for the shared causal variant 
hypothesis (PPH4) is 0.997. In this region, the WT global was also in LD 

(r2 > 0.6) with ischemic stroke. (B) In 7p21.1, we observed colocalization between 
the DAo minimum area (DAo min area, index variant rs2107595) and systolic 
blood pressure (index variant rs57301765, PPH4 = 0.998). In this region, the 
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DAo min area was also in LD with stroke, intracranial aneurysm, coronary artery 
disease, and moyamoya disease. (C) In 15q25.2, we observed colocalization 
between the regional myocardial wall thickness at end-diastole (WT AHA 7, index 
variant rs11638445) and schizophrenia (index variant rs12902973, PPH4 = 
0.922). In this region, the WT AHA 7 was also in LD with bipolar disorder. AHA 7, 
American Heart Association (AHA) region 7. (D) We illustrated the colocalization 
between the AAo maximum area (AAo max area) and functional connectivity 
between the default mode and orbito-affective networks (shared index variant 


rs1678983) in 15q21.1 (PPH4 = 0.964). 
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activity in the putamen and temporal cortex of 
patients with Alzheimer’s disease (86). CMR 
traits were also in LD (r” = 0.6) with neuro- 
degenerative and neuropsychiatric disorders 
such as Parkinson’s disease (87) and Alzheimer’s 
disease (88) (fig. S88), hippocampal sclerosis 
of aging (89) (fig. S74), schizophrenia (90) (Fig. 
4C and fig. S49), bipolar disorder (91) (Fig. 4C 
and figs. S82 and S89), and eating disorders 
(92) (fig. S90). In addition, CMR traits were in 
LD (7° = 0.6) with mental health traits such as 
neuroticism, depressive symptoms, subjective 
well-being, and risk-taking tendency (figs. S88 
and S91 to S93). 

For cognitive traits and education, we tagged 
17q21.31, 11p11.2, and 11q13.3 with cognitive func- 
tion and educational attainment (figs. S88, S93, 
and S94); 7q32.1 with reading disability (fig. 
$95); and 12q24..12 with reaction time (fig. S50). 
We also found shared associations (LD 7” = 0.6) 
in five regions with DTI parameters (47) (figs. 
S96 to S100); four regions with regional brain 
volumes (45) (figs. S101 to S104); and five re- 
gions with fMRI traits (49) (Fig. 4D and figs. 
$105 to S108). The colocalization analysis re- 
vealed that CMR traits shared causal genetic 
variants with these phenotypes, such as 15q25.2 
with schizophrenia, 15q21.1 with functional 
connectivity, as well as 11q24.3 and 12q24.12 
with white matter microstructure (PPH4 > 
0.809). There is substantial evidence support- 
ing the interplay between cardiovascular health 
and these brain traits and diseases. For example, 
people with better heart health have better 
cognitive abilities (93) and lower risk for brain 
disorders such as stroke and Alzheimer’s dis- 
ease (94). In addition, mental health disorders 
may result in biological processes and behav- 
iors that are associated with cardiovascular 
diseases (17, 95). Our findings indicate that 
cardiovascular conditions share substantial 
genetic components with brain diseases, men- 
tal health traits, and cognitive functions, sug- 
gesting a potential genetic basis for heart-brain 
connections. 

Genetic overlaps with other diseases and 
complex traits were also observed. For exam- 
ple, RVEDV was in LD (7? = 0.6) with type 1 dia- 
betes (96) and type 2 diabetes (97, 98) in the 
12q24.12 region (fig. S50). CMR traits were in LD 
(r* > 0.6) in 11 regions with lung conditions such 
as asthma (99) (fig. S82), idiopathic pulmo- 
nary fibrosis (J00), interstitial lung disease 
(101) (fig. S88), and lung function (figs. S60, 
S64, S67, and S77). We also found shared ge- 
netic associations (LD 7” = 0.6) with smoking 
(figs. S50, S82, and S93) and alcohol consump- 
tion and alcohol use disorder (figs. S49, S88, 
and S93). 


Genetic correlations with brain disorders 
and complex traits 


First, we examined genetic correlations among 
82 CMR traits using cross-trait LDSC (102). 
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Strong genetic correlations were observed with- 
in and between categories of CMR traits (fig. 
S109 and table S12). For example, RVEDV was 
genetically correlated with other RV traits, in- 
cluding RV stroke volume (RVSV), RVESV, and 
RVEF. The RVEDV was also correlated with 
CMR traits from other categories, such as AAo 
maximum area and DAo maximum area, LASV 
and RA stroke volume (RASV), as well as LVEDV, 
LVESV, LVM, and LVEF. In addition, we found 
a strong relationship between phenotypic and 
genetic correlations among all CMR traits (6 = 
0.781, P <2 x 10°). 

Next, we examined the genetic correlations 
between 82 CMR traits and 60 complex traits 
and diseases. At the FDR 5% level (82 x 60 tests), 
the CMR traits were associated with heart dis- 
eases, lung function, cardiovascular risk factors, 
and brain-related complex traits and dis- 
eases (table S12). For example, hypertension 
had clear genetic correlations with aortic traits 
and LV traits (Fig. 5A). The strongest correla- 
tion between LV traits and hypertension was 
found in wall thickness traits (P < 2.43 x 10°°), 
which were also associated with coronary ar- 
tery disease, type 2 diabetes, and stroke (Fig. 
5B). In addition, atrial fibrillation was signif- 
icantly associated with aortic, LA, and RA 
traits (P < 6.66 x 10~*), suggesting that atrial 
fibrillation might have a higher genetic sim- 
ilarity with LA and RA traits than with LV and 
RV traits. 

In both schizophrenia and bipolar disorder, 
we observed genetic correlations with multi- 
ple LV traits (Fig. 5C). Specifically, LVCO, LVEF, 
radial strains, and wall thickness traits showed 
positive genetic correlations with schizophre- 
nia and/or bipolar disorder. By contrast, peak 
circumferential strains had negative genetic 
correlations with the two brain disorders. Ad- 
ditionally, anorexia nervosa (an eating dis- 
order) was genetically associated with LAV, yin 
and LAEF, whereas cognitive traits and 
neuroticism were mainly associated with right 
heart traits (RA and RV traits) (Fig. 5, D and 
E). For example, intelligence, cognitive function, 
and numerical reasoning were genetically cor- 
related with RA volumes. Lung functions (FEV 
and FVC) had genetic correlations with multi- 
ple CMR traits, with longitudinal strains show- 
ing the strongest correlations. There were more 
associations with other complex traits analyzed 
in previous GWAS, such as smoking, PR inter- 
val, blood pressure, education, risky behaviors, 
and lipid traits (fig. S110A). We also found high 
genetic correlations with four previously re- 
ported LV traits (4D (genetic correlation > 0.847, 
P < 644 x 10°") (fig. SI10B). Additionally, 
we built PRS for 82 CMR traits and examined 
their associations with 276 phenotypes avail- 
able in the UKB study. The PRS analysis pro- 
duced genetic association patterns similar to 
those from the LDSC analysis. More details 
and interpretations are available in the sup- 


plementary text (59) (figs. S111 and S112 and 
table S13). 


Causal heart-brain relationships detected 
by Mendelian randomization 


In light of the widespread genetic correlations 
between the heart and brain, we examined 
their underlying causal genetic links using the 
82 CMR traits with Mendelian randomization 
(MR) (103). 

We investigated 11 well-powered (7 > 20,000) 
brain-related clinical outcomes from the FinnGen 
database (104) and six neuropsychiatric dis- 
orders from the Psychiatric Genomics Consor- 
tium (05). We also evaluated nine cognitive and 
mental health traits such as intelligence and 
neuroticism (see the materials and methods). 

Most of the MR findings indicated genetic 
causal effects from the heart to the brain (table 
$14 and fig. S113). We identified causal genetic 
links underlying heart health and neuropsy- 
chiatric disorders. Specifically, multiple ge- 
netic causal effects of wall thickness traits, 
DAo minimum area, and LVESV to psychi- 
atric diseases and mental health traits were 
identified at the FDR 5% level (P < 1.68 x 10 *), 
such as the cross disorders [five major psy- 
chiatric disorders (J06)], bipolar disorder, and 
depression (Fig. 6). The presence of heart con- 
ditions may adversely affect attitude and mood, 
which may ultimately lead to mental health 
problems such as depression and other psy- 
chiatric disorders (107). For example, hyper- 
trophic cardiomyopathy is associated with an 
increased risk of mood disorders (108). Heart 
muscle thickening makes it more difficult for 
the heart to pump blood, and when oxygen to 
the brain is reduced, mental health issues may 
develop (109). We also observed causal genetic 
effects of wall thickness traits on neuroticism, 
for which the phenotypic association has been 
identified (55). Moreover, AAo minimum area 
and AAo maximum area were causally linked 
to multiple FinnGen diseases of the nervous 
system, such as neurological diseases, sleep 
apnea, and episodic and paroxysmal disorders. 
Conversely, we identified several causal rela- 
tionships in which brain disorders were the 
exposure and CMR traits were the outcome; 
most of these were from sleep apnea to radial 
strains. In previous studies, the reduction in 
radial strain has been found in patients with 
moderate to severe obstructive sleep apnea 
(110), and our results demonstrate that this as- 
sociation may have a causal genetic component. 


Biological and gene-level analyses 


We performed gene-level association testing 
using GWAS summary statistics of the 82 CMR 
traits with MAGMA (J1/). We identified 163 sig- 
nificant genes for 48 CMR traits (P < 3.24 x 10°%, 
Bonferroni adjusted for 82 traits) (table S15). 
Next, we mapped significant variants (P < 
6.09 x 101°) to genes by combining evidence 
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Fig. 5. Genetic correlations between CMR traits and other complex traits and diseases. (A) We illustrated selected genetic correlations between CMR traits 
(x axis) and complex traits and diseases (y axis). The asterisks highlight genetic correlations that have passed multiple testing adjustments using the Benjamini-Hochberg 
procedure to control the FDR at the 5% level. (B to E) Illustration of CMR traits that exhibited genetic correlations with stroke (B), schizophrenia (C), anorexia nervosa (D), 


and cognitive function (E). 


of physical position, eQTL association, and 
three-dimensional chromatin (Hi-C) interac- 
tion using FUMA (112). We found 585 mapped 
genes, 440 of which were not identified in 
MAGMA (table S16). Moreover, 91 MAGMA or 
FUMA-identified genes had a high probability 
of being loss-of-function intolerant (773) (pLI > 
0.98), indicating significant enrichment of in- 
tolerance of loss-of-function variation among 
these CMR-associated genes (P = 1.68 x 107“). 
We conducted MAGMA gene-set analysis to 
prioritize enriched biological pathways and 
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performed partitioned heritability analyses 
(114) to identify tissues and cell types (75) in 
which genetic variation contributed to differ- 
ences in CMR traits [fig. S114, table S17, and 
supplementary text (59)]. 

Ten genes were targets for 32 cardiovascular 
system drugs (16), such as 15 calcium chan- 
nel blockers [anatomical therapeutic chemical 
(ATC) code: C08] to lower blood pressure, five 
cardiac glycosides (ATC code: C01A) to treat 
heart failure and irregular heartbeats, and 
three antiarrhythmics (ATC code: CO1B) to 


treat heart rhythm disorders (table S18). Three 
of these genes, CACNAII, ESRI, and CYP2C9, 
and four more CMR-associated genes, ALDH2, 
HDAC9, NPSRI1, and TRPA1, were targets for 11 
nervous system drugs, including four anti- 
epileptic drugs (ATC code: NO3A) and two 
drugs for addictive disorders (ATC code: NO7B). 
Some drug target genes have known biolog- 
ical functions in both the heart and the brain. 
For example, ALDH2 plays a role in clearance 
of toxic aldehydes, which is an important 
mechanism related to myocardial and cerebral 
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Fig. 6. Genetic causal effects of CMR traits on psychiatric disorders. We illustrated selected significant (P < 1.68 x 10°“) causal genetic links from CMR traits 
(exposure) to psychiatric disorders (outcome) after adjusting for multiple testing using the Benjamini-Hochberg procedure to control the FDR at the 5% level. 
Category, the category of CMR traits; #IVs, the number of genetic variants used as instrumental variables. Different Mendelian randomization methods and their 
regression coefficients are labeled with different colors. See table S14 for data resources of psychiatric disorders. 


ischemia-reperfusion injury (17). Therefore, 
ALDH2 has been proposed to be a protective 
target for heart and brain diseases and dys- 
functions triggered by ischemic injury and 
related risk factors (118, 119). 

Finally, we conducted complex trait and dis- 
ease prediction using both genetic and multi- 
organ MRI data. We found that integrating 
genetic PRS, CMR traits, and brain MRI traits 
could enhance the prediction of multisystem 
diseases (e.g., diabetes) compared with using 
only one data type [figs. S115 and 116, tables 
S19 and S20, and supplementary text (59)]. 


Discussion 


The intertwined connections between heart 
and brain health are gaining increasing at- 
tention. This study quantified the heart-brain 
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associations using CMR and brain MRI data 
from >40,000 individuals in one study cohort 
(UKB). After accounting for various body mea- 
surements, shared risk factors, and imaging 
confounders, we discovered that CMR traits 
were associated with specific brain regions, 
white matter tracts, and functional networks. 
For example, LV traits and aortic areas were 
connected to white matter microstructure, 
with FA and MD values exhibiting opposite 
directions. Univariate analysis and CCA indi- 
cated that aortic traits were associated with 
basal forebrain volumes in both the left and 
right hemispheres. The basal forebrain cho- 
linergic system, which is the primary cholin- 
ergic output of the central nervous system, is 
crucial in cognitive decline and dementia 
(120, 121). Reduced basal forebrain volume 


and vascular dysregulation are early predictors 
of Alzheimer’s disease pathology (122, 123). 
Moreover, several CMR traits, including LVM 
and ejection fraction measures, were asso- 
ciated with the somatomotor, auditory, and 
default mode networks in resting fMRI. The 
CMR associations with the default mode and 
other networks were generally in opposite 
directions. Increased LVM and reduced ejec- 
tion fraction traits are associated with a higher 
risk of cardiac diseases (55). Our findings sug- 
gest that abnormal functional connectivity 
within these networks could potentially act 
as an early biomarker of brain dysfunction 
associated with adverse cardiac conditions. 
Overall, our research indicates that there are 
associations between multimodal MRI mea- 
surements of the heart and brain, hinting at 
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potential connections between cardiovascu- 
lar and neurological health. 

We used multiorgan imaging data to iden- 
tify genetic variations that can affect both the 
heart and brain. Comprehending the genetic 
pleiotropies and the intricate directional and 
bidirectional interactions of human organs 
is a complex task (17). Our study provides evi- 
dence of causal genetic effects between CMR 
traits and brain disorders through MR analy- 
sis. Because CMR traits are endophenotypes of 
various cardiovascular diseases (e.g., hyper- 
tension and hypertensive diseases), these find- 
ings suggest that early intervention in heart 
conditions and the management of cardiac 
risk may have a positive impact on brain 
health. Numerous studies have examined the 
cognitive and neuropsychiatric effects of anti- 
hypertensive medications, such as B-blockers 
and calcium channel blockers (124, 125), and 
some recent studies reported their benefi- 
cial effects on psychiatric and neurological 
disorders. For example, in a meta-analysis of 
209 studies, antihypertensive medications 
were found to reduce dementia risk by 21% 
(126). Brain-penetrant calcium channel block- 
ers were associated with a lower incidence of 
neuropsychiatric disorders (127). The CMR 
and brain MRI traits prioritized in our heart- 
brain analyses could be helpful in identifying 
potential therapeutic targets and evaluating 
the therapeutic potential (or side effects) of 
existing antihypertensive drugs and heart dis- 
ease medications for mental health and neuro- 
degenerative disorders. 

To mitigate the confounding effects of body 
size, our analyses have adjusted for a wide 
range of variables collected by the UKB study, 
including height, weight, whole-body fat free 
mass, waist-to-hip ratio, body surface area, 
and nonlinear high-order terms (128). How- 
ever, unobserved biological interactions and 
environmental factors may still confound 
the identified heart-brain connections. The 
concept of large-scale multiorgan imaging ge- 
netics analysis is relatively new, and future 
research using additional data resources, such 
as long-term longitudinal data and large-scale 
omics data from multiple organs, may pro- 
vide further insights into the shared biology 
between the brain and heart. In addition, 
our analyses faced challenges because of the 
use of different brain MRI traits generated 
from multiple imaging modalities. For exam- 
ple, previous studies have shown that lower 
FA and higher MD of white matter are as- 
sociated with accelerated brain aging, indicat- 
ing reduced microstructural coherence with 
aging (129). Resting functional connectivity 
strength has also been often found to be lower 
in the aging brain (J30). In our analyses, cer- 
tain CMR traits correlated with distinct cat- 
egories of brain MRI traits in contrasting 
directions. For example, higher wall thick- 
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ness was linked to larger subcortical regional 
brain volumes in structural MRI, lower FA in 
diffusion MRI, and mostly stronger functional 
connectivity strength in resting fMRI of cor- 
tical brain areas. These findings may suggest 
that white matter and gray matter are differ- 
entially associated with certain heart func- 
tions. However, potential confounding factors 
cannot be completely ruled out, because the 
MRI traits were from different areas of the 
brain and extracted using different brain maps 
and processing procedures. To better establish 
and investigate these patterns, future studies 
could incorporate new brain MRI traits, such 
aS microstructure measures in gray matter 
brain regions, and produce diffusion MRI and 
fMRI traits in the same brain atlas, allowing 
for a more comprehensive analysis of the struc- 
tural and functional relationships between the 
heart and the brain. 

In this study, imaging data were mainly 
from individuals of European ancestry. Com- 
paring UKB GWAS results with those of BBJ, 
we found both similarities and differences for 
genetic influences on CMR traits. For exam- 
ple, participants in UKB and BBJ had similar 
genetic effects on cardiac conditions at 22q11.23 
and 8q24.13, but only the BBJ cohort showed 
genetic effects at 10q22.2. There was also a 
reduction in PRS performance in the BBJ-UKB 
prediction compared with the prediction anal- 
ysis within the UKB study. Furthermore, the 
UKB study is well known for its “healthy vol- 
unteer” selection bias and may not be an ideal 
representation of the general European popu- 
lation (137). It can be expected that some of 
the genetic components that underlie heart- 
brain connections may be population specific 
or UKB specific. More open and large-scale 
imaging datasets (132) collected from global 
populations may help to identify causal var- 
iants associated with CMR traits in globally 
diverse populations and quantify population- 
specific heterogeneity of genetic effects. These 
new data will also enable the development of 
a better picture of neurological-cardiac inter- 
actions and allow researchers to examine the 
reproducibility of scientific findings. 

This paper specifically focuses on heart-brain 
connections. Because of the large amount of 
data collected in the UKB study, it is also 
possible to study the relationships between 
the brain and other human organs and sys- 
tems (133). For example, increasing evidence 
supports the gut-brain axis, which involves 
complex interactions between the central ner- 
vous system and the enteric nervous system 
(134). Patients with inflammatory bowel disease 
(e.g., Crohn’s disease) show a higher risk of men- 
tal disorders such as depression and anxiety 
(135). Multisystem analysis using biobank-scale 
data may provide insights for interorgan patho- 
physiological mechanisms and guide the pre- 
vention and early detection of brain diseases. 


Methods summary 

Our study aimed to explore the connection be- 
tween the heart and brain by analyzing multi- 
organ imaging data obtained from >40,000 
subjects. We used recently developed pipelines 
for cardiac and aortic MRI (55-57) to generate 
imaging traits for four cardiac chambers, LV, 
LA, RV, and RA, and two aortic sections, AAo 
and DAo. Moreover, we extracted various im- 
aging traits from multiple brain MRI modal- 
ities, including structural MRI (47), diffusion 
MRI (49), and resting-state and task-based 
fMRI (57). We then performed phenotypic and 
genetic analyses on these multiorgan imaging 
traits to examine the relationship between the 
heart and brain. 

We performed a discovery-replication anal- 
ysis to assess pairwise phenotypic associations 
between heart and brain imaging traits while 
controlling for various covariates such as body 
size (128), shared risk factors, and imaging con- 
founders. Additionally, we conducted separate 
univariate analyses of structural and func- 
tional connection patterns for both female and 
male subjects. To better understand the rela- 
tionship between CMR traits and different brain 
MRI modalities, we used CCA (66) to examine 
multivariate associations. 

We used data from UKB individuals of British 
ancestry to estimate the SNP heritability of 
82 CMR traits (67) and performed GWAS using 
linear mixed-effect models implemented in 
fastGWA (136). To ensure the robustness of 
our findings, we conducted separate GWASs 
with independent holdout datasets to repli- 
cate the identified loci. We also conducted sex- 
specific SNP heritability and GWAS analyses 
to compare the genetic effects on CMR traits 
between males and females. Additionally, we 
generated PRS (70) to assess the proportion of 
variation in CMR traits that could be predicted 
by genetic variants in European and non- 
European testing cohorts. To investigate gene- 
level associations, we used MAGMA (95), and 
we mapped GWAS signals to genes using func- 
tional genomic information in FUMA (38). 

We used GWAS results of CMR traits to un- 
cover the genetic overlaps with other complex 
traits and diseases previously identified in 
GWASs, including our brain MRI traits and 
those catalogued in the NHGRI-EBI GWAS 
database (71). We applied Bayesian colocal- 
ization analysis (72) to examine the presence 
of shared causal genetic variants underlying 
genetic pleiotropy. Additionally, we used cross- 
trait LDSC (102) to estimate genome-wide ge- 
netic correlations between CMR traits and 
other complex traits and diseases. 

We further investigated genetic associations 
by examining the relationship between PRS of 
CMR traits and phenotypes collected in the 
UKB study. We also used additional data re- 
sources, such as FinnGen (J04), which pro- 
vided GWAS results on brain-related clinical 
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outcomes, to conduct a two-sample MR anal- 
ysis (103) to investigate the genetic causal 
relationships between CMR traits and brain 
disorders. Additionally, we evaluated the pre- 
dictive ability of CMR traits for complex traits 
and diseases in the UKB study, and improved 
prediction accuracy by integrating genetic PRS, 
CMR traits, and brain MRI traits. 
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Autonomous alignment and healing in multilayer soft 
electronics using immiscible dynamic polymers 


Christopher B. Cooper't, Samuel E. Root't, Lukas Michalek!, Shuai Wu2, Jian-Cheng Lai’, 
Muhammad Khatib’, Solomon T. Oyakhire’, Renee Zhao’, Jian Qin’, Zhenan Bao’* 


Self-healing soft electronic and robotic devices can, like human skin, recover autonomously from 
damage. While current devices use a single type of dynamic polymer for all functional layers to ensure 
strong interlayer adhesion, this approach requires manual layer alignment. In this study, we used two 
dynamic polymers, which have immiscible backbones but identical dynamic bonds, to maintain interlayer 
adhesion while enabling autonomous realignment during healing. These dynamic polymers exhibit a 
weakly interpenetrating and adhesive interface, whose width is tunable. When multilayered polymer films 
are misaligned after damage, these structures autonomously realign during healing to minimize 
interfacial free energy. We fabricated devices with conductive, dielectric, and magnetic particles that 
functionally heal after damage, enabling thin-film pressure sensors, magnetically assembled soft robots, 


and underwater circuit assembly. 


elf-healing allows soft electronic devices 
to recover from various forms of dam- 
age, such as punctures, scratches, and 
slices, to improve device robustness and 
lifetime. Previous work has demonstra- 
ted self-healing polymers that use a range of 
dynamic bonds, such as hydrogen bonding 
(/-3), metal-ligand coordination (4-6), or dy- 
namic covalent bonds (7). These polymers are 
generally insulating, thus, to make functional 
electronic devices, they are embedded with 
conductive or dielectric materials (e.g., par- 
ticles, nanowires, nanotubes, flakes, etc.) to 
achieve the desired bulk electrical properties 
while retaining the soft mechanical properties 
of the self-healing polymer matrix. These self- 
healing composites can recover not only their 
original mechanical properties upon healing 
but also their electrical conductivity (8). 
Many self-healing devices have been re- 
ported, including aquatic skin, field-effect 
transistors, light-emitting capacitors, battery- 
based sensors, and advanced multifunctional 
sensing platforms (9-11). As the complexity of 
devices has increased, it has become necessary 
for self-healing to simultaneously occur be- 
tween multiple layers with different functions. 
This concept was shown for an electronic skin 
that integrated multiple functional compo- 
nents, but thick layers and careful manual 
alignment were needed to ensure functional 
self-healing between all layers (8). A similar 
problem was encountered for self-healing tran- 
sistors, which saw a decreased drain current by 
almost one order of magnitude, owing to im- 
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perfect alignment of the source and drain 
electrodes (12). 

Self-healing devices have required manual 
alignment after damage to properly align dif- 
ferent functional components, which is im- 
practical for thin devices (<~100 um) (10). When 
the fractured surfaces of a multilayered device 
are brought back into contact, even slightly 
misaligned layers can limit functional recov- 
ery. This issue stems from the use of only a 
single type of self-healing polymer throughout 
the device. Although using the same polymer 
for all functional components ensures strong 
interlayer adhesion, there is no selectivity 
during healing between different functional 
components to drive realignment. 

We demonstrate a multilayered self-healing 
device composed of a pair of self-healing poly- 
mers with identical dynamic bonds but im- 
miscible polymer backbones. When misaligned 
after damage, these multilayer structures have 
composition gradients that drive directional 
chain diffusion to enable autonomous realign- 
ment. Moreover, the similar dynamic bonds 
between the polymers enable strong interfacial 
adhesion between the otherwise immiscible 
layers. We prepared conductive and insulat- 
ing composites to form thin-film pressure 
sensors, magnetically assembled soft robots, 
and underwater circuits, which readily self- 
heal after mechanical damage. The minimal 
interlayer diffusion between the polymers also 
prevents diffusion of the embedded particles, 
which preserves each layer’s electronic func- 
tion and prevents damage-induced mixing. 


Molecular design of immiscible dynamic polymers 


We selected polydimethylsiloxane (PDMS) and 
polypropylene glycol (PPG) as model immiscible 
backbone polymers because they are flexible, 


(. 


amorphous polymers with low glass transi Chee 


temperatures (T,ppms = —125°C and Tz pi 
-—75°C) and different bulk surface free ener- 
gies (yppms ~ 21 mJ/m? and yppg ¥ 31 mJ/m?”) 
(13, 14). To minimize the effect of film micro- 
structure on self-healing properties, we in- 
corporated a combination of bisurea bonds 
formed from both 4,4’-methylene bis(phenyl 
isocyanate) (MPU) and isophorone diisocyanate 
(IU) into each polymer, which has been shown 
to produce amorphous, self-healing films with- 
out nanoscale aggregation (J, 15-17). The 
strong directional binding of the MPU units 
incorporate elasticity into the network, while 
the weaker binding interactions of the IU units 
provide a stress-dissipation mechanism to im- 
prove bulk toughness and prevent formation 
of microstructures. For both synthesized poly- 
mers, we tuned the MPU:IU ratio and the aver- 
age backbone molecular weight (M,,) to achieve 
healing dynamics between 30° and 100°C 
with solid-like properties at room tempera- 
ture, which are necessary for device fabrica- 
tion and stability. The PDMS-based polymer, 
hereafter referred to as PDMS-HB, has an 
average Mj, of 5 kDa for the PDMS backbone 
repeat units, an MPU:IU molar ratio of 0.3:0.7, 
and an overall number-averaged molecular 
weight (M,,) of 46 kDa [dispersity (P) ~ 1.5]. 
The PPG-based polymer, hereafter referred to 
as PPG-HB, has an average M, of 0.75 kDa for 
the PPG backbone repeat units, an MPU:IU 
ratio of 0.5:0.5, and an M,, of 10 kDa (DB ~ 1.7) 
(Fig. 1, A and B, and table S1). 

We confirmed the lack of larger microstruc- 
tures by small-angle x-ray scattering (SAXS), 
which gave characteristic domain spacings of 
between 6 and 9 nm (Fig. 1C and table S1). 
Moreover, both polymers exhibit a crossover 
between the storage and loss modulus between 
75° and 85°C (Fig. 1D and table S1) and glass 
transition temperatures well below room tem- 
perature (7; ppms-Hp < —80°C, Ts ppc-HB = 
-—35°C; fig. SI), which enables experimentally 
accessible healing dynamics. PPG-HB has less 
than one-fifth the 4, of PDMS-HB but similar 
mechanical and thermal properties, which is 
consistent with previous work that has found 
that polyethers destabilize hydrogen bond for- 
mation and require higher density to achieve 
a given mechanical property (2). The differ- 
ence in surface energies between PDMS-HB 
(23 mJ/m*) and PPG-HB (44: mJ/m?) was ex- 
perimentally confirmed by contact angle mea- 
surements (fig. S2 and tables S1 and S82) (78). 

We characterized the self-healing behavior 
of PDMS-HB and PPG-HB by adapting a re- 
cently reported technique, wherein disks of 
polymer are healed on a parallel plate rheom- 
eter with a contact area defined by a polytet- 
rafluroethylene (PTFE) sheet with a hole (Fig. 
1E) (2, 19). After annealing, the plates were 
pulled apart at a constant rate to generate stress- 
displacement curves, similar to those obtained 
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Fig. 1. Design and characterization of a pair of dynamic polymers— 
PDMS-HB and PPG-HB—with immiscible backbones and identical hydrogen 
bonding units. (A) Schematic showing the principle of surface tension— 
mediated realignment and healing of a fractured multilayer laminate. The 
difference in surface energy between the two polymer backbones (type A and 
type B) drives realignment, while the dynamic bonds in both polymers promote 
interlayer adhesion for device performance. (B) Chemical structures of the 

two immiscible dynamic polymers used in this study, PDMS-HB and PPG-HB. x, 
mole fraction of MPU; n, average number of monomers in the backbone segment 
between dynamic bonds. (C) SAXS curves showing the amorphous structure 

of PDMS-HB (black) and PPG-HB (pink) with domain sizes of ~6 to 9 nm. 

(D) Rheological characteristics of PDMS-HB (black) and PPG-HB (pink) 
showing the crossover between the storage modulus (G’, solid squares) and 
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loss modulus (G", open circles) around 75° to 85°C. This crossover point 
corresponds to the onset of flow in the bulk materials. (E) Schematic of the 
experimental setup of the self- or interfacial healing between two polymers. 
The recovery in tensile strength (F), max displacement (G), and interfacial work 
(H) for self-healed PDMS-HB (black squares), self-healed PPG-HB (pink circles), 
and PDMS-HB healed with PPG-HB (purple triangles). Each point is averaged 
over three samples, with a healing time of 30 min at the specified temperature. 
Optical microscope images of a spin-coated film of PDMS-HB and PPG-HB 

(50 wt %) immediately after casting (1) and after annealing at 70°C for 

24 hours (J) and 168 hours (K), and the corresponding AFM nanomechanical 
images (L to N). Phase separation increases with increased annealing time. The 
modulus of neat PDMS-HB and PPG-HB measured by AFM are 1 and 20 MPa, 
respectively, suggesting that the pink regions are PPG-rich. 
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Fig. 2. The interface between two immiscible dynamic polymer networks 
with identical dynamic bonds. (A) AFM characterization of the modulus gradient 
across the interfaces of bilayer films prepared by gently hot pressing (~20 kPa) 
at (i) 50°C, (ii) 70°C, and (iii) 100°C, before (top) and after (bottom) annealing for 
24 hours at 70°C. The higher-modulus region corresponds to pure PPG-HB (pink), 
and the lower-modulus region corresponds to pure PDMS-HB (dark gray). Fitted 
interfacial profiles obtained from the AFM images (iv) immediately after hot 
pressing and (v) after annealing for 24 hours at 70°C, showing that the interfaces 
are at thermodynamic equilibrium. (B) Coarse-grained molecular dynamics 
simulation snapshots for equilibrated interfaces with (i) €4g = 0.97, (ii) exp = 0.99, 
and (iii) €4g = 0.995. Inset shows chains at the interface with two different polymer 
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backbones (A beads, black; B beads, pink) and identical dynamic bonds (X beads, 
blue). Table gives the relative energetic attraction between different bead types. 
(iv) Fitted interfacial profiles obtained from the equilibrated simulations. (v) Dynamics 
of the fitted interfacial width during simulation while approaching equilibrium. 

(C) (i) Schematic of the field-theoretic model, showing two polymer backbones 

(A, gray; B, pink) with a repulsive y,g interaction, an identical dynamic bond (X, blue) 
with an attractive €xx interaction, and a chain length of N. (ii) Interfacial profiles 
predicted by the field-theoretic model for different values of yg normalized by chain 
length: 2/N (orange), 3/N (green), 8/N (teal), 16/N (blue), and 32/N (purple). (iii) 
Sticker volume fraction across the interface for the same x,ap values, showing that 
dynamic bonds cluster at the interface with increasing yap. 
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on an extensometer (figs. S3 and S4). We re- 
peated this process for different temperatures 
in 10°C steps and monitored healing by track- 
ing the recovery in the tensile strength (Fig. 
1F), the max displacement (Fig. 1G), and the 
interfacial work (i.e., the area under the stress- 
displacement curve; Fig. 1H) as a function of 
healing temperature (fig. S3). In all cases, we 
observed a plateau at higher temperatures in- 
dicative of full healing. Both PDMS-HB and 
PPG-HB were fully healed after 30 min at ~80° 
to 90°C, consistent with their terminal flow 
onset temperatures (Fig. 1D and table S1). These 
results are also consistent with experimental 
work on interfacial healing of metallosupra- 
molecular polymers as well as theoretical and 
computational predictions (20-22). 

We next evaluated the interfacial healing 
between PDMS-HB and PPG-HB, which have 
identical dynamic bonds but immiscible poly- 
mer backbones. The self-healing of two poly- 
meric interfaces involves wetting between the 
two-dimensional (2D) interfaces and then cre- 
ation of a 3D interphase that propagates with 
macromolecular diffusion and possibly poly- 
mer reentanglement to restore bulk properties 
(23). Neumann et al. found that for self- 
healing of metallosupramolecular polymers, 
full healing was achieved only when the 3D 
interphase reached widths on the order of 
~100 nm (20). We hypothesized that the use of 
similar dynamic bonds would enable wetting 
and adhesion of the 2D interface, while the 
difference in surface free energies between 
PDMS-HB and PPG-HB would limit the width 
of a 3D interphase, approaching the limiting 
case of two fully immiscible polymer blends 
(24, 25). Compared with the self-healing cases, 
the PDMS-HB:PPG-HB interface exhibited 
reduced healing, even at 100°C, when both 
samples exhibit rapid dynamics and liquid- 
like behavior. The tensile strength recovered 
almost immediately to ~70% of the PDMS-HB 
pristine interface, indicative of good wetting 
between the surfaces. However, both the max 
displacement and interfacial work recoveries 
remained lower (<20%) than the healed sam- 
ples with two pieces of identical polymers (Fig. 
1, F to H). 

These results suggest that healing between 
the samples is thermodynamically restricted 
because of a lack of macromolecular diffusion 
across the interface. To further test this hy- 
pothesis, we spin-coated a film of PDMS-HB 
and PPG-HB (50 wt % blend) from a homoge- 
neous solution and then annealed the sample 
at 70°C for various lengths of time. Optical 
microscope images (Fig. 1, I to K) and atomic 
force microscopy (AFM) images (Fig. 1, L to N) 
showed increasing phase separation with in- 
creased annealing, with coarsening occurring 
across all measured length scales. These ob- 
servations suggest that PDMS-HB and PPG-HB 
are thermodynamically immiscible and explain 
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the limited healing observed between the two 
polymers. In addition, we measured interfacial 
healing between PDMS-HB and PPG-HB at 
70°C for longer healing times and showed that 
minimal additional healing occurred (fig. S5). 
This finding implies that increased healing at 
higher temperatures (Fig. 1, F to H) arises not 
from faster polymer dynamics but rather from 
increased miscibility. 


Interface between two immiscible 
dynamic polymers 


To further test this hypothesis, we character- 
ized the interface between PDMS-HB and 
PPG-HB through a combination of experiments, 
simulation, and theory. We laminated two layers 
of PDMS-HB and PPG-HB together by gently 
hot pressing (~20 kPa of pressure; fig. S6) at 
different temperatures and then measured cut 
interfaces by AFM before and after annealing 
at 70°C (26, 27). Tracking changes in the mod- 
ulus revealed an interface between PDMS-HB 
and PPG-HB (Fig. 2A), whose width we mea- 
sured quantitatively by fitting to a sigmoidal 
function (Eq. 1), analogous to the analytic so- 
lution by Helfand and Tagami for the interface 
between two immiscible polymers (24) 


1 
o(z) = 14 eG (1) 
(z) is the volume fraction of one polymer as a 
function of position, % is the location of the 
interface, and € is a measure of the inter- 
facial width. The fitted interfacial widths (€) 
increased with increasing hot-pressing tem- 
perature with values of 13 + 1, 23 + 1, and 39 + 
1 nm for 50°, 70° and 100°C hot-pressed films, 
respectively (Fig. 2A and fig. S7). However, if 
subsequently annealed at the same temper- 
ature, all films exhibited similar interfacial 
widths (Fig. 2A and fig. S7) of 23 + 1, 26 + 2, 
and 20 + 1 nm for the initially 50° 70°, and 
100°C hot-pressed films, respectively. These 
observations suggest that the interfaces are at 
thermodynamic equilibrium during hot pres- 
sing and annealing. Moreover, these interfaces 
were all measured at room temperature, with- 
out rapid quenching, which means that the 
interfacial width (and thus the interlayer ad- 
hesion) can be programmed at a specific tem- 
perature and then locked in place by cooling 
the chains into a kinetically trapped state. To 
demonstrate this concept, we performed an 
interfacial healing experiment at 100°C for 
30 min with an additional annealing step at 
70°C for 30 min (fig. S8). Consistent with the 
decreased interfacial width measured after 
annealing, the interfacial work between the 
two polymers decreased with the additional 
annealing step. 

We also estimated the interdiffusion depth 
of PDMS-HB into bulk PPG-HB by performing 
x-ray photoelectron spectroscopy (XPS) on in- 
terfaces healed for 30 min at 70° and 100°C and 


then mechanically separated at room temper- 
ature. We saw a clear decrease in the Si/C ratio 
with increased sputtering, which allowed us to 
estimate the molar fraction of PDMS-HB as a 
function of depth from the interface (figs. S9 
and S10). This yielded an interfacial width (€) 
of 7 nm at 100°C and 2 nm at 70°C. The in- 
creased interfacial width at higher temperature 
matches the trend observed through AFM. 

We next sought to model this process in a 
general manner by conducting coarse-grained 
molecular dynamics simulations of the inter- 
face between two identical polymers contain- 
ing identical dynamic bonds but immiscible 
backbones (Fig. 2B) (28, 29). We periodically 
spaced dynamic bonding beads along the poly- 
mer backbone with increased interaction en- 
ergy (€z = 5) relative to the backbone beads 
(ep = le). Following previous simulations of 
homopolymers, immiscibility of the backbones 
was introduced by decreasing the interaction 
energy between distinct backbone beads from 
€ap = le (Self-healing) to e,z = 0.95e (entirely 
immiscible) (30). Independently prepared 
slabs of the two polymer species were brought 
together in the melt state and allowed to 
interdiffuse over time until the interface 
reached a thermodynamic equilibrium (Fig. 
2B). Throughout the simulation, the inter- 
facial width was tracked as a function of 
time by fitting Eq. 1 to the extracted density 
profiles (Fig. 2B). Consistent with experi- 
ments, we observed a sigmoidal density profile 
and that the equilibrium interfacial width 
(measured as a correlation length by fitting 
to Eq. 1) decreased with increasing backbone 
immiscibility. 

Finally, we developed a field-theoretic de- 
scription of the interface between two im- 
miscible polymer backbones (denoted by A or 
B monomers) with the same dynamic bonding 
units (denoted by X monomers). The model 
predicts the monomer density profiles for an 
incompressible melt of AX and BX block co- 
polymers of the same chain length N, whose 
interactions are dominated by a pairwise, re- 
pulsive y parameter between A and B mono- 
mers (7 ap) and a pairwise, attractive parameter 
between X monomers (€xx) (Fig. 2C). With 
increasing Yap, analogous to decreasing tem- 
perature in experiments, we observed a de- 
crease in the interfacial width between the 
polymers (Fig. 2C). In addition, we also ob- 
served an increase in sticker clustering at the 
AX-BX interface with increasing ¥ x (Fig. 2C), 
where stickers at the interface reduced the 
system free energy by minimizing the number 
of A-B contacts. This is further supported by 
the fact that the density profiles are almost 
independent of €xx (fig. S11). 

The combination of experiments, simulation, 
and theory suggest that with increasing tem- 
perature, the interface between two immiscible 
dynamic polymers is governed by a decreasing 
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Fig. 3. Autonomous alignment and healing between immiscible dynamic 
polymers in a multilayered film. Cross-sectional optical microscope images of 
(A) the pristine hot-pressed multilayer laminate, (B) the damaged and misaligned 
laminate, and (C) the healed and realigned laminate after annealing for 24 hours 
at 70°C. A small amount of blue dye was added to PPG-HB for optical contrast, which 
appears pink in dark-field images at higher magnifications. The cross-linked 


x, parameter (yap) between the polymer back- 
bones, which increases the interfacial width. In 
addition, dynamic bonds cluster at the inter- 
face to reduce contacts between the immiscible 
backbones. When normalized by estimated 
radius of gyration, R,, values for PDMS-HB and 
PPG-HB (~6 and ~3 nm, respectively, assum- 
ing homopolymers with similar //,), the inter- 
facial widths measured experimentally by AFM 
are larger than those observed in simulations 
or predicted by theory (37). However, the nor- 
malized interfacial widths obtained by XPS are 
well within the observed values. We attribute 
the larger interfacial widths measured by AFM 
to finite tip-size broadening during the inden- 
tation measurement but note that the impor- 
tant qualitative trends remain consistent across 
experiments, simulation, and theory (32). More- 
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over, the consistency between experiments, 
simulation, and theory suggests that these 
models could be used to screen polymer back- 
bones and dynamic bonding linkers for desir- 
able mechanical and healing properties. 


Autonomous alignment and healing of 
multilayered polymer films 


We next tested the healing of multilayer films 
of PDMS-HB and PPG-HB. We hypothesized 
that the reduced interfacial healing between 
the polymers would enable autonomous re- 
alignment of the films after damage. Taking 
advantage of the immiscibility of PDMS-HB 
and PPG-HB, we stacked alternating films 
with a thickness of ~100 um and hot pressed 
them to a final film with a thickness of ~70 um 
and 11 alternating layers, with individual thick- 
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PDMS substrate (bottom layer) was unable to heal and marks the damage site. 

(D to F) Simulation snapshots showing how an initially misaligned and separated 
laminate aligns and heals over time. (G) Misalignment distance (6), normalized by the 
chain Rg, decreases linearly with simulation time, normalized by the time to diffuse one 
chain Rg (tp, ), until alignment is achieved. The slope of -0.1 corresponds to a 
realignment rate of 0.1R¢ per tp,. 


nesses ranging from 3 to 15 um (Fig. 3A). The 
resulting film was placed on a cross-linked 
PDMS substrate and then cut in half. Figure 
3B shows the misalignment between the al- 
ternating layers as well as the cut extending 
into the cross-linked PDMS substrate. During 
healing, the layers autonomously realigned and 
reformed sharp and alternating interfaces be- 
tween the PDMS-HB and PPG-HB (Fig. 3C). 
The misaligned cut in the cross-linked PDMS 
(which was not able to self-heal) remains vis- 
ible. When the same type of dynamic polymer 
was used for both layers, autonomous realign- 
ment during healing was not observed (fig. S12). 

The phenomenon of autonomous realign- 
ment and healing in multilayer structures was 
also observed in our coarse-grained simula- 
tion model. In the simulation, polymer surfaces 
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Fig. 4. Demonstration of functional layer recognition and healing in soft 
electronic devices based on dynamic polymer composites. (A) Schematic 
of a pressure sensitive capacitor with electrodes made from a PPG-HB:carbon 
black 4:1 weight ratio composite, and dielectric layers made from a PDMS-HB: 
SrTiO3 4:1 weight ratio composite. Dark-field optical images of the cross sections 
of the initial capacitor (B), the capacitor after fracture showing layer 
misalignment (C), and the healed device after realignment during annealing for 
24 hours at 70°C (D). (E) Initial (top) and healed (bottom) pressure-sensing 
performance as a time series. Capacitance was monitored while cyclically 
applying pressures ranging from O to 80 kPa. (F) Capacitance versus pressure 
showing the linear dependence of the capacitance on pressure, with minimal 
change in drift and hysteresis between the initial (top) and healed (bottom) 
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sensor. (G) Series capacitance and resistance as the device is cut and healed at 
room temperature, and after annealing at 70°C. (H) Schematic of core-shell 
magnetic fibers made from a PPG-HB:NdFeB flake 1:4 weight ratio composite and 
PDMS-HB. (I) Magnetic assembly of the core-shell fibers. (J) Thermal welding 
of the assembled fiber at 70°C for 5 min with a heat gun. (K) Images of the 
welded device bending, twisting, and stretching to show mechanical robustness. 
(L) Schematic of double core-shell fibers with separate electrically conductive 
(PPG-HB:Ag flake 1:1 weight ratio composite) and magnetic (PPG-HB:NdFeB flake 
1:4 weight ratio composite) layers with PDMS-HB shells. (M) Images of the 
underwater circuit assembly of the LED. (N) Current-voltage sweeps of the initial 
device (dashed black), after room temperature underwater healing (solid red), 
and after annealing at 70°C for 72 hours (solid blue). 
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fused together with an initial misalignment 
(denoted 5) and then selectively interdiffused 
and steadily realigned until reaching complete 
alignment (Fig. 3, D to G, and fig. S13). The 
correspondence between simulation and ex- 
periment suggests that this phenomenon can 
be generalized to other pairs of polymers to 
simultaneously achieve strong interlayer ad- 
hesion and selective interlayer healing. 


Demonstration of functional healing for 
soft electronics 


To demonstrate that the use of alternating 
layers of immiscible dynamic polymers could 
promote alignment during the healing of a 
thin (~10 to 100 um) multilayered electronic 
device, we investigated the functional healing 
of a pressure-sensitive capacitor (Fig. 4A). The 
capacitor was made from alternating layers 
of homogeneous composites of PDMS-HB 
embedded with dielectric strontium titanate 
(SrTiO3) microparticles (20 wt %) and PPG-HB 
embedded with conductive carbon black nano- 
particles (20 wt %). Figure 4, B to D, shows 
microscope images of a cross section of the 
parallel plate capacitor in the pristine, dam- 
aged, and healed states. Even when misaligned 
after damage, the multilayer capacitor re- 
aligned during healing and recovered its sen- 
sing capability, exhibiting quantitatively similar 
pressure-sensing performance when subjected 
to the same cyclic loading conditions (Fig. 4, E 
and F). We also monitored the change in the 
series capacitance and resistance immediately 
after damage (Fig. 4G). Only a partial recovery 
of the capacitance was observed at room tem- 
perature, and microscope images of the edge 
of the device showed misaligned layers (Fig. 
4C). However, after heating, the layers almost 
completely realigned (Fig. 4D), and the device 
recovered 96% of its initial capacitance. Me- 
chanical recovery was also confirmed by man- 
ually applying a tensile force to the sample, 
resulting in fracture of the underlying sub- 
strate while maintaining integrity of the 
healed layers (fig. S14). 

As an additional demonstration of the util- 
ity of this pair of selectively weldable dynamic 
polymers, we fabricated core-shell fiber struc- 
tures with composites containing magnetic 
NdFeB microflakes (~10 um, 80 wt %) em- 
bedded in PPG-HB as the core material and 
PDMS-HB as the shell material (Fig. 4H). These 
fibers are magnetized along their longitudinal 
directions with an impulse magnetizer (1.5 T). 
When cut into pieces, the fibers’ motion could 
be controlled with an external magnetic field 
to achieve rigid body rotations for reassem- 
bling without any manual alignment. When 
close in distance, these fibers exhibited a mag- 
netic attractive force that induced a contact 
pressure to promote selective welding of the 
layers (Fig. 4I and movie S1). After thermal 
welding at 70°C, the magnetically assembled 
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fibers could withstand bending, twisting, and 
stretching deformations (Fig. 4, J and K). In 
contrast to single-component magnetic self- 
healing, which achieves macroscopic assembly 
of pieces but lacks the precision for microscop- 
ic alignment, this work shows that we can 
simultaneously employ two alignment mecha- 
nisms: magnetically guided macroscopic 
alignment and interfacial-tension mediated 
microscopic alignment (33-36). 

Building on this demonstration, we fabri- 
cated multilayered magnetic wires with a con- 
ductive core, an insulating shell, a magnetized 
layer, and an outer encapsulating shell (Fig. 4:L). 
Two wires with opposite magnetic orienta- 
tion were assembled to make a light-emitting 
diode (LED) circuit. Upon cutting the wires 
into four pieces, the circuit could be reas- 
sembled by adding the components into a 
glass vial filled with water, where the magnetic 
forces guided the assembly of the wire to achieve 
almost instantaneous electrical healing, illumi- 
nating an LED (Fig. 4M, fig. S15, and movie S2). 
The magnetic forces guided the alignment of 
the two terminals of the LED in the correct 
orientation with respect to the voltage source 
(+3 V) and ground. Comparison of current- 
voltage sweeps showed comparable turn-on 
voltages before and after healing (Fig. 4N), 
with full mechanical and electrical healing 
achieved after thermal annealing at 70°C 
for 72 hours. 

In this study, we achieved autonomous align- 
ment during the self-healing of multilayered 
soft electronic devices by using two immiscible 
dynamic polymers, whose different backbones 
enabled interfacial tension-mediated realign- 
ment after damage. We used the same dy- 
namic bond in both polymers to maintain 
strong interlayer adhesion required for a stretch- 
able device. The interfacial width between 
the polymers, which subsequently determines 
the interlayer adhesion, can be programmed 
by annealing temperature. Simulation and 
theory results suggest that this design concept 
can be readily extended to other molecular sys- 
tems. We fabricated thin-film healable pressure 
sensors, magnetically assembled and welded 
structures, and self-healable underwater circuits 
that autonomously realign during healing to 
demonstrate the capabilities of this approach. 
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Aptameric receptors are important biosensor components, yet our ability to identify them depends on 
the target structures. We analyzed the contributions of individual functional groups on small molecules 
to binding within 27 target-aptamer pairs, identifying potential hindrances to receptor isolation—for 
example, negative cooperativity between sterically hindered functional groups. To increase the 
probability of aptamer isolation for important targets, such as leucine and voriconazole, for which 
multiple previous selection attempts failed, we designed tailored strategies focused on overcoming 
individual structural barriers to successful selections. This approach enables us to move beyond 
standardized protocols into functional group-guided searches, relying on sequences common to 
receptors for targets and their analogs to serve as anchors in regions of vast oligonucleotide spaces 


wherein useful reagents are likely to be found. 


ptamers are oligonucleotide-based_ re- 
ceptors isolated from random libraries 
through cycles of enrichment based on 
target affinity coupled to amplifications 
(1-4). Aptamers can be selected for a va- 
riety of small molecules for which antibodies 
cannot—that is, targets ignored by the immune 
system even when conjugated to carrier pro- 
teins, such as neurotransmitters (5) and amino 
acids (6, 7). Once available, aptamers can be 
readily engineered into various sensor formats 
(3, 4), including for use as fluorescent (8), elec- 
trochemical (9), or electronic biosensors (10). 
One of the main obstacles to the broad ap- 
plication of aptamers in biosensing is a lack of 
aptamers with appropriate affinities for many 
important low-molecular-weight targets (3, 4). 
For example, we were repeatedly unable to iso- 
late DNA aptamers for two clinically important 
molecules, the amino acid leucine (Leu, 1) and 
the antifungal agent voriconazole (2) (Fig. 1A). 
Aptamers to detect blood leucine levels could 
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be used to rapidly clarify false positives during 
newborn screening for maple syrup urine dis- 
ease (MSUD) (6, 17). We have sought to expand 
on our success with vancomycin sensing (12) 
and to isolate receptors that could be used for 
voriconazole therapeutic monitoring (73). Our 
attempts were variations of selections based 
on target-induced stem closure (Fig. 1B) (14, 15). 
In this approach, oligonucleotide libraries 
with internal random 36-nucleotide oligomer 
(36-mer) regions are immobilized through 5’- 
primer regions that hybridize with tethered 
capture sequences. Potential aptamers hybri- 
dized on columns are released by interactions 
with unmodified targets in solution, which can 
stabilize stem formation upon displacement 
(Fig. 1B). 

Our failure to isolate DNA aptamers for 
leucine was surprising because RNA aptamers 
had been previously isolated through affinity 
columns displaying tethered leucine (6). Sim- 
ilarly, voriconazole should have been a straight- 
forward target because of its aromatic surfaces 
and heteroatoms. However, we could neither 
adapt reported aptamers cross-reactive with the 
azole class of antifungals (17) as sensor compo- 
nents (8, 12), nor could we isolate specific 
aptamers. These two seemingly unrelated tar- 
gets, with substantially different molecular 
weights, share proximate pairs of sterically 
crowded sp” carbons (Fig. 1A), which inspired 
us to pursue a broader understanding of the 
general relationships between target struc- 
tures and outcomes of highly standardized 
selections. Our aim was to develop a gener- 
alizable approach to aptamer isolation that 
succeeds when other standard methods fail. 


Analysis of free energies of oligonucleotide 
displacement across related targets 


We amassed 27 aptamers, 23 of which were 
newly isolated through this study. These aptamers 


emerged directly from selections and wit] ches 
further optimization, being identified as i 
highest-affinity receptors targeting amines, 
amino acids, and their analogs. In the past, 
while working with individual aptamers, we 
focused on aptamer dissociation constants ob- 
tained by a fluorescence-quenching assay that 
reported fluorescently labeled aptamer compe- 
tition with a quencher-labeled capture oligo- 
nucleotide (Fig. 1C and displacement assay 
rationale, materials and methods); the assay 
could be adapted to a model of allosteric anta- 
gonism to account for partial release upon 
binding (J8). To characterize the impact of 
targets on selection outcomes, we instead 
needed to compare targets in their abilities 
to outcompete capture oligonucleotides. Thus, 
we focused on the equilibrium constant, “??Kp 
(a midpoint response or X509,), of the displace- 
ment of oligonucleotide competitor that is 
used on the affinity column during selection, 
which is related to the Gibbs free energy of 
displacement, AGp. In contrast to the free 
energy of binding, AGp—obtained, for exam- 
ple, by isothermal calorimetry—AGp governs 
a comprehensive set of equilibria that affects 
the release of aptamers from the column upon 
target addition. The difference between AGp ‘ 
and AGx is primarily in the contributions of the 
capture oligonucleotide present at equilibria. 

The targets (table S3) and their aptamers 
(figs. S4 to S34) were organized in related ‘ 
pairs (Fig. 1D and figs. S41 to S45), with each 
pair differing by the addition of a single func- 
tional group or group transformation—for 
example, methylamine (3) and phenylethyl- 
amine (4) differ by the addition of a seven- 
carbon benzyl group (Fig. 1D). We defined 
AAGgpr as the free-energy difference related 
to the equilibria positions affecting the rel- 
ative outcomes of two selections, attributable 
to the presence of the additional functional < 
group or transformation. We also assumed ‘ 
the portions of AGp that govern equilibria 
unrelated to either target or capture oligo- 
nucleotide binding to be similar across all 
aptamers and predicted that they would large- 
ly cancel each other when subtracting two AGp 
values within pairs, which allowed us to extract 
estimates of AAGgpr values (Fig. 1D). Related 
concepts on contributions to the free energy of 
binding associated with functional groups are 
often used for ligand optimization in medici- 
nal chemistry (19, 20), but there a receptor is 
shared by multiple targets. 

Two key assumptions, aside from nearly 
identical selection conditions, were needed to 
extend the concept of functional-group free- 
energy contributions to selections: 

First, there are ~10™ possible random 36-mers. 
In selections, we sample only ~10™ of these 
sequences. Thus, in the absence of extraordi- 
nary luck, we do not isolate unique receptors, 
but typical ones (2/, 22), which are examples 
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Fig. 1. Target functional-group binding free-energy analysis for aptamers 
from stem-loop libraries: (A) Using standard protocols, we were unable to 
isolate aptamers for leucine (1) and voriconazole (2), which have congested pairs 
of carbons (*). (B) Aptamer selection is driven by small-molecule-induced 
stem closures. An oligonucleotide library with a random loop (Nz¢) is hybridized 
to the complement (capture strand) of a polymerase chain reaction (PCR) 
primer. The capture strand is tethered to a column. The column is exposed to 
target solutions. Sequences that bind the targets and undergo stem stabilization 
are released, preferentially amplified, and used in the next selection cycle. 

(C) We measured apparent *?PKp values for aptamers and from these calculated 
the free energies of displacement, AGp, on the basis of a fluorescence 
displacement assay associated with the equilibrium between an aptamer (labeled 
with fluorescein, F) and a complementary oligonucleotide used for capture in 
selection (labeled with a quencher, dabcyl, D). The addition of the target leads to 
concentration-dependent increases in fluorescence through the equilibria shown. 
Ky and Ka are dissociation constants for a target-aptamer complex without 


competitor and an aptamer-competitor complex without target, respectively. 
(D) We isolated the contributions of individual functional groups by subtracting 
individual AGp values of aptamer-target pairs, with these values corrected to 
account for differences in oligonucleotide quenching. The two targets, 
methylamine (MEA) (3) and phenylethylamine (PHEA) (4), differ by a benzyl 
group. The difference in free energy associated with benzyl group addition is 
AAGgpe (benzyl). The two aptamers used for this calculation are shown 
(constant regions in lower case). (E) Cooperativity is assessed by double 
functional group replacement cycles (24). The AGp and *°’Kp (normalized to 
average impact of oligonucleotide on equilibrium, in kJ/mol) values are 

shown next to the targets, with AAGcpe values shown next to the fragments. A 
AAGcpe >0 indicates a decrease in affinity upon adding a functional group; the 
AGc value (+8.0 kJ/mol) represents the difference between adding functional 
groups separately (top horizontal and left vertical values) versus at the same 
time (diagonally), which is interpreted as negative cooperativity when both a 
benzyl group and a carboxylate are present together in a molecule. 


of multiple sequences having similar affin- 
ities values broadly distributed over oligonu- 
cleotide space. This sparse sampling then 
allows us to treat the properties of the iso- 
lated aptamers, represented here by the “best” 
aptamer from each selection, as characteristic 
of the highly standardized selection condi- 
tions, libraries, and targets. Because selec- 
tions for previously identified aptamers differ 
mostly in their targets, we attribute large 
changes in the properties of the aptamers to 
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the impact of structural differences between 
targets—that is, to specific functional groups. 
Second, functional group contributions to 
selections can only be based on well-known 
noncovalent interactions (20, 23). Thus, as a 
first approximation, within a set of close ana- 
logs, we expect to be able to isolate additive 
effects. When we observe systematic nonaddi- 
tivities in thermodynamic cycles—for example, 
cooperativity (AG) as estimated through cycles 
of double replacements of functional groups 


(20, 24)—we can analyze nonadditivities to 
generate hypotheses about barriers to apta- 
mer isolation (Fig. 1E). Reciprocally, if correct 
in our assumptions, after initial selection fail- 
ures, we can perform functional group analy- 
sis of targets to identify possible structural 
barriers leading to these failures and design 
selection protocols to improve our chances of 
isolating aptamers. 

We performed the following three tests 
with the available aptamers to assess these 
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Fig. 2. Analysis of AGp and AAGgge from a set of 27 aptamers. (A) Exemplary 
targets used to characterize binding optimization to hydrophobic surfaces during 
selections. (B) Regression analysis of target AGp versus number of heavy atoms 
(other than hydrogen) in aromatic hydrophobic fragments within targets. 
Hydrophobic and aromatic fragments are shown as blue squares (amines) or red 
diamonds (amino acids). The regression line, including methylamine and two 
aromatic amines (3, 4, 8), was used to estimate the contributions of the 
hydrophobic surfaces in two aromatic amino acids (6, 10) and a nonaromatic 
hydrophobic amine related to leucine (7). Data for the four aptamers for 6 are shown 
individually. Methylene blue (9) is the target with the highest affinity for the 
aptamers isolated directly from N3¢ libraries. Two amides (CONHp, brown 

circles; figs. S26 and S27) have AGp values above the regression line, carboxylates 
below (diamonds), and histamine (10) and serotonin (11) are on the regression line. 
Unmarked data points are for tyramine (fig. S30), tyrosine (fig. S29), dopamine 
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(fig. S33), and L-dopa (fig. $20). R*, coefficient of determination. (C) Additivity 

of AAGgpe in similar compounds (compare to fig. S46). Using the average AAGcpe 
values of a pair of planar indole-methylene-containing molecules and five 
carboxamides, we estimated AGg for the melatonin (13) aptamer. H, hydrogen; R, 
other substituents. (D) Distributions of AAGgg_ contributions of selected functional 
groups for carboxylates, carboxamides, guanidiniums, and hydrophobic groups. 

We show rounded averages (thick lines) and standard deviations in kJ/mol. We also 
present (black crosses) cooperativities (AGc) assessed through double functional 
group replacement cycles (Fig. 1E) for groups added to methylamine together with 
carboxylates and carboxamides to obtain individual amino acids and their amides 
[figs. S41 to S45; the value for phenylalanine (6) is stressed]. All data points in (B) 
to (D) are results of individual selection experiments, and the uncertainty of this 
approach can be assessed by four aptamers for phenylalanine (6) in (B), which were 
isolated in four independent selections. ND, not determined, N < 3. 


assumptions. Although each test individually 
was limited because of small sample sizes, to- 
gether, they strongly supported our reasoning. 
First, we analyzed the four highest-affinity apta- 
mers for phenylalanine from four separate se- 
lections and obtained similar AGp values (and 
estimated AG x values) within <3 kJ of each 
other (Fig. 2B and table S4). This result is con- 
sistent with the affinity of winning aptamers 
being regularly distributed over oligonucleotide 
space, thus representing a reproducible property 
of selections. These findings suggest that large 
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differences in target-related AGp should reflect 
differences in functional groups and not differ- 
ent selections. 

Second, we observed correlations between 
AGp values and the numbers of heavy (non- 
hydrogen) atoms in the hydrophobic fragments 
within related targets (Fig. 2, A and B). The 
molecule with the largest hydrophobic surface, 
methylene blue (9), yielded the highest affinity 
of all targets. The correlation between methyl- 
amine (3) and two planar aromatic amines in 
our set, 4 and 8, supports an argument that the 


applied selection pressure directly optimizes 
affinity in proportion to hydrophobic surfaces— 
that is, is based on the functional groups 
present—and that we can subtract two AGp 
values to isolate the impact of structural changes. 
We see indications that functional group-based 
optimization is general, with methylbutylamine 
(7), histamine (10), and serotonin (11) being 
very close to the aromatic amine (3, 4, 8) re- 
gression line, although caution should be exer- 
cised not to overinterpret these results without 
further structural information (24). 
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Fig. 3. Multistep functional group-guided approach to high-affinity 
aptamers for leucine. (A) Leucine was split into two fragments, isobutyl and 
2-aminoethanoate. We designed a selection to isolate aptamers sequentially to 
recognize one, then both fragments, thereby reducing the target-related barriers 
in each step and increasing the probability of finding leucine aptamers. Shown 
are the complex of leucine and Cp*Rh(IIl) (14) and other amino acids for which 
we observed challenging aptamer cross-reactivities (15, 16). (B) We started 
with a Cp*Rh(Ill) aptamer, performing an Nz» random insertion to form a library used 
to identify the /Bu.1 sequence, which contained motifs important for recognition of 
the iBu group. We used a second Nz» library (red) to focus selection pressure on the 
2-aminoethoanoate group to arrive at leucine aptamers. (C) Secondary structures 

of the related aptamers CpLeul.0, Leu2.1 (minimized from Leu2.0, fig. S34), and 
CuLeul.0. The inserted iBul motif is shown in blue in CpLeul.0, and carried-over 
sections of the CpRh1.0 aptamer are shown in black in all three aptamers. The 
CpLeul.O aptamer binds Leu in the presence of Cp*Rhi III), whereas Leu2.1 binds 
leucine on its own. The CuLeul.0 aptamer binds Leu in the presence of Cu(II) with high 


affinity (*°PKp ~170 nM). (D) Double-functional group replacement cycle, methylamine 
(3) to leucine (1). (E) Fluorescence versus target concentrations in the presence 

of 40 uM Cu(II) (displacement assay; compare to Fig. 1C) for CuLeul.0O and three 
branched-chain amino acids. RFU, relative fluorescence units; AA, amino acids. 

(F) Preliminary analytical assessment of CuLeul.O in mock human serum samples 
spiked with branched-chain amino acids to mimic values for patients with 

MSUD (table S5 and fig. S53). The correlation is between measured values [dilution 
1:500, 100 uM Cu(II] of X'Le (the sensor-responsive fraction) versus added values 
for {[Leu] + 0.57*[allo-lle]}. The high correlation indicates that this sensor is a 
suitable component of a minimal cross-reactive array for monitoring in patients, 
although dilution might have to be adjusted depending on the target range. 
Allo-lle is negligible at birth and during the first several postpartum days (11); thus, 
we also show correlation in the same mock samples but without allo-lle (gray 
circles indicate mock samples with both Leu and allo-lle; black circles indicate 
mock samples without allo-lle). Measurements in (E) and (F) are in triplicates with 
standard deviations shown (too small to be seen in E). 


Third, we added average AAGgpr values cal- 
culated from two planar indole-containing 
amines and five primary carboxamides (fig. S46) 
to obtain a close match with an experimentally 
determined AG x value for melatonin (13), a 
planar molecule containing an indole and a 
secondary carboxamide (Fig. 2C; the addi- 
tion of AAGgpr values leads to AGg). Thus, our 
protocol simultaneously optimizes the presence 
of multiple functional groups, and we can use 
this property to interpret deviations from ad- 
ditivity. Our standard selection protocol (Fig. 1B) 
depends on target-induced oligonucleotide dis- 
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placement outcompeting background “noise” in 
the form of more common processes (22, 25)- 
here, most dominantly, ligand-independent oli- 
gonucleotide release from the column. Then, 
throughout our protocol, certain combinations 
of functional groups on targets decrease the 
overall probability of isolating candidate apta- 
mers, including in the early selection steps, 
which could be critical for selection success. 

We used the aromatic amine (3, 4, 8) re- 
gression line (Fig. 2B) to estimate the impact 
of additional carboxylates on the AGp values for 
the aptamer-target complexes of two related 


aromatic amino acids, phenylalanine (6) and 
tryptophan (12), observing that the addition 
of a carboxyl group is similar to the loss of 
receptor hydrophobic contacts for between 
one and two heavy atoms, which is intuitively 
consistent with the introduction of a polar car- 
boxylate near a primarily hydrophobic pocket. 
Our analysis of double functional group re- 
placement cycles (Figs. 1E and 2D, and figs. 
S41 to $45) revealed substantial negative 
cooperativity while adding negative charge 
in proximity to mismatched groups, such as 
hydrophobic residues (in phenylalanine and 
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Fig. 4. Selection of voriconazole aptamers using an analog. (A) Structure of 
voriconazole (2) with three fragments (I to Ill) and its analog 2a, in which 
fragment Ill was substituted with a methyl group (Ill). The arrows indicate the 
perspective used to produce the Newman projections below. The anti (1, III) 
conformation is similar to an observed crystal structure (30). The voriconazole 
analog 2a simplifies the largest fragment (Ill) and was designed for reduced 
complexity and as a more suitable target for selection. Here, the anti (I, Il) 
conformation is likely to be favored and to be the dominant epitope in selection. 
(B) The aptamer Vor1.0 was isolated in the selection protocol that used 2 and 2a 
in parallel. The secondary structure of Vorl.0 is shown as predicted by (top) 


tryptophan). These structural constellations, 
then, were identified as likely to reduce the 
probability of aptamer isolation for leucine. 


Functional group-guided selections for leucine 


We extended our analysis to hypothesize that 
the two out-of-plane carbons and a carboxyl- 
ate, all in proximity in leucine, act synergisti- 
cally to minimize contact surfaces and reduce 
affinities of typical aptamers, thus allowing 
competing ligand-independent release mech- 
anisms to dominate and suppress the desired 
outcomes. To overcome this issue, we separated 
the selection steps for the alkyl (isobutyryl) 
and o-amino-carboxylate groups (Fig. 3, Aand 
B). We first implemented a protocol to identify 
a sequence, 7Bu.1, certain to contain a binding 
motif for the isobutyl group. We started with 
a cyclopentadienyl-rhodium(IT]) [Cp*RhdID]- 
binding aptamer (CpRh1.0) specifically isolated 
for this purpose, as a temporary placeholder 
for sequences that interact with the carboxyl 
and amino groups (26). We inserted a random 
22-mer region (N22), Which would become 7Bu, 
into the CpRh1.0, creating a new library (we 
can screen a complete 22-mer sequence space). 
From this library, we selected aptamers such 
as CpLeul.0, which bound leucine in the pres- 
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ence of the Cp*Rh(dII) cofactor. Although we 
could immediately eliminate CpLeul.0 from 
further consideration as a Leu sensor, because 
of its complex mechanism of interactions with 
leucine, reflected in a sharp threshold behav- 
ior of the fluorescent sensor (compare to fig. 
S38), we knew that the inserted sequence, 
7Bu, had to contain binding motifs for the Leu 
side chain. 

We then designed a library of 22-mers (No) 
with 7Bu.1 positioned next to the stem. From 
this library, using elutions with leucine with- 
out the cofactor, we identified Leu2.1 (Fig. 3C). 
The Leu2.1 aptamer had a Kp of almost 10 mM 
(fig. S28) and an ~4:1 preference for Leu over 
isoleucine (Ile) (fig. S49). The negative cooper- 
ativity (AG-) between the carboxyl and isobutyryl 
groups was large (>10 kJ/mol), providing an 
explanation for our initial selection difficulties 
(Fig. 3D). 

We identified homologous regions I to III in 
CpLeul.0 and Leu2.1, two of which, II and III 
in Leu2.1, originated from the inserted ran- 
dom region outside of 7Bu.1. The short Leu2.1 
aptamer should have been abundant in any 
initial pool; furthermore, isolation of motifs II 
and III in control studies of insertion reselec- 
tion (fig. S50) indicated that motif I is not ab- 


mFold and (bottom) as an alternative secondary structure, which was 
subsequently confirmed to be the active sensor structure. Structure switching 
allows this aptamer to be captured on the column (the upper structure allows 
capture) during the initial stages of selection (compare to fig. S61). A variant of 
Vorl.0, Vorl.1.4 (which cannot be captured on the column and thus was not 
isolated during selection), was turned into a quenching-FRET sensor and 
responded to both 2 and 2a [fluorophores: F, fluorescein; T, TAMRA 
(carboxytetramethylrhodamine)]. By using fluorescence, this sensor detected 
voriconazole concentrations as low as 3 uM; thus, this oligonucleotide is a 
candidate for engineering of electrochemical sensors for in vivo monitoring (12). 


solutely required in Leu aptamers. However, 
apparently because of its low affinity, Leu2.1 
required the prefixed compatible sequences 
within I to increase the probability of isolation 
through a reduction in the required sequence 
length in the newly inserted random region. 
The Leu2.1 aptamer had a millimolar target 
affinity insufficient for the intended appli- 
cation of testing newborns with MSUD (1). 
In addition, Leu2.1 preferred phenylalanine 
over leucine (fig. S49), which was a dominant 
problem in our prior selections that used 
Cp*Rh(III) as the cofactor (fig. S51). Thus, in 
the last step of the selection, we added an 
aminophilic Cu(II) (27), to improve affinity 
and selectivity. We hypothesized that Cu(II) 
would serve as a protecting group neutralizing 
the effects of the carboxylate through com- 
plexation with the 2-aminoethanoate group. 
Complexation would allow better access of 
hydrophobic DNA monomer residues to the 
leucine side chain, improving affinity and se- 
lectivity. Consistent with our hypothesis, we 
identified aptamer CuLeul.0 as having a 44- 
mer loop, conserved sections of 7Bu.1, and high 
affinity for leucine (Kp ~170 nM;; Fig. 3C). 
CuLeul.0 had selectivity for leucine over 
isoleucine, valine, and phenylalanine, but we 
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noted strong cross-reactivity with allo-isoleucine 
(allo-lle) (Fig. 3E and Newman projections in 
fig. S52). The allo-isoleucine metabolite was 
not initially considered in the aptamer coun- 
terselections because its concentrations are 
negligible at birth. Newborn screening is cur- 
rently performed with mass spectroscopy (JJ), 
which integrates isobaric species to provide 
XLe values (where XLe is Leu + Ile + allo-Ile + 
2OHPro) (where 2OHPro is 2-hydroxyproline). 
Thus, our aptamer sensor is a candidate for 
the development of rapid tests to address false 
positives in MSUD by showing a lack of steady 
increase in X'Le values in consecutive mea- 
surements, with X’Le defined, for example, as 
[Leu] + 0.57*[allo-Ile] (Fig. 3F and fig. S53). 
After the first few days of life, however, allo- 
Ile concentrations increase, so a monitoring 
strategy without fully specific aptamers would 
require a cross-reactive array (26, 28), for which 
CuLeul.0 is a suitable component. 

The multistep approach with Cu(II) can 
be generalized to amino acids that display a 
side chain away from the Cp*Rh(ID) complex, 
such as Ile (compare to Cullel1.1, figs. S54 to 
S56). This approach would not work for amino 
acids that carry a chelating group beyond 2- 
aminoethanoate—for example, glutamate (fig. 
S57). For comparison, we performed a single- 
step Leu selection with Cu(II) as the cofactor. 
We isolated receptors with about fivefold-lower 
affinities than that of CuLeul1.0. The two most 
abundant sequences preferred isoleucine or 
methionine (figs. S58 to S60). These aptamers 
are also candidates for arrays. 


A structure-guided approach to aptamers 
for voriconazole 


Leucine (1) is closely related to other amino 
acids in our target set. By contrast, voriconazole 
(2) is an example of applying a structurally 
guided approach to unrelated molecules. We 
initially attributed our voriconazole selection 
failures to its limited solubility (~200 uM). None- 
theless, selections using a soluble voriconazole 
phosphate analog also failed. We considered 
that voriconazole, similarly to leucine, has a 
sterically crowded structure (Fig. 4A) that 
forces its fragments (structural subunits) into 
a propeller-like conformation, as revealed in 
crystal structures (30). This sterically crowded 
conformation was hypothesized to lead to sub- 
optimal access to hydrophobic surfaces in DNA 
that are needed to interact with fragments I to 
III (Fig. 4A). One possible retrosynthetic discon- 
nection (37) led to a simplified, less-congested, 
and readily synthesized alcohol analog, 2a 
(Fig. 4A). 

Initial attempts starting with 2a at high 
concentrations—although introducing vorico- 
nazole separately in later cycles—failed, yielding 
exclusively analog-binding aptamers. Further 
conformational analysis using Newman pro- 
jections clarified that 2a likely presents a do- 
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minant epitope during selection in which 
fragments I and II are positioned anti. Con- 
versely, in voriconazole, these structural subunits 
are gauche (Fig. 4A). Inspired by approaches 
to outflank the immunodominance of epito- 
pes (32), we mixed 2 and 2a at their respec- 
tive maximal soluble concentrations in the 
initial selection steps, only gradually phasing 
out the analog. We hypothesized that this pro- 
cedure would maximize the probability of 
release of aptamers that bind similar confor- 
mations of the target and its analog, which 
could be important in the initial rounds of 
selection. In contrast to previous failures, this 
change led to two aptamers (figs. S61 and S62) 
responsive to 2 and 2a (Fig. 4B), confirming 
the advantage of adding the analog. 

The mechanisms underlying the improved 
selection strategy for voriconazole are partially 
unclear because we cannot exclude the possi- 
bility that the analog minimizes target ag- 
gregation. Nonetheless, the presence of 2a is 
certain to improve target-receptor occupancy 
in the initial cycles, likely buttressing low effec- 
tive concentrations of monomeric voriconazole 
in conformations that can elicit aptamers. The 
isolated aptamers do not bind fluconazole, sug- 
gesting they are not class-wide cross-reactive 
aptamers (17, 28, 29) and further confirming 
that stabilizing interactions occur with group 
III in 2 (Fig. 4A). 

Mutagenesis studies indicated that our lead 
aptamer, Vorl1.0, is a destabilized three-way 
junction (Fig. 4B), which we engineered into 
a fluorescence resonance energy transfer (FRET) 
sensor, Vor1.1.4 (Fig. 4B). The latter shows suf- 
ficient sensitivity for testing as an electrochem- 
ical sensor for in vivo use (12). This specific 
family of voriconazole-binding three-way junc- 
tions, despite being common, are eliminated 
from direct selections by exceptionally poor 
interactions with capture oligonucleotides, a 
problem that was prevented in Vor1.0 by struc- 
ture switching (Fig. 4B and fig. S61). These 
observations showcase the complex balance 
between positive and negative selection pres- 
sures in selection protocols. Our procedures, as 
demonstrated through leucine and voriconazole 
selections, shift selection balance in our favor 
by addressing probabilistic barriers assigned 
to crowded (and other nonoptimal) substruc- 
tures within targets. 


Conclusions 


In traditional organic synthesis, the functional 
group abstractions and their reactivities guide 
us through transformations involving rela- 
tionships between nuclei and electron clouds 
(33). In our structure-guided aptamer selec- 
tions, analogous concepts directed random 
searches through the space of complementary 
interactions between targets and aptamer re- 
ceptors. We developed several approaches that 
can be used in functional group-guided selections. 


These include insertion reselection, carrying- 
over and anchoring of partial motifs, the ex- 
panded use of metal complexes as “protecting” 
groups matched to targets, placeholders, cross- 
linkers, and the synthesis of simpler analogs 
designed to overcome steric, conformational, 
or solubility barriers. These approaches can 
be further studied, optimized, and combined 
with one another and with traditional proto- 
cols (3, 4), organic receptor cofactors (6, 34), 
and modified bases (35), while considering lib- 
rary designs (25), to enable isolation of high- 
quality aptamers and engineering of biosensors 
for previously inaccessible targets. 

There are further topics to which our ap- 
proach, once systematically expanded, is ex- 
pected to provide original insights. The first 
is the question of natural selection of complex 
functions in the hypothetical, preprotein, RNA 
world (36). Behaving as tinkerers (37), we re- 
used simple sequence pieces to find functions 
requiring more complex sequences, thus ex- 
panding the early work on the use of cofactors 
in RNA catalysis (38). Second, the approach 
that applies structural analysis of ligands to 
find optimal receptors could be inverted, com- 
bined with structural methods and insights 
from a large set of aptamers to improve our 
ability to design small-molecule drugs that spe- 
cifically modulate natural nucleic acid targets 
(39). And third, we provide a substantially ex- 
panded set of sequences with confirmed tar- 
get binding that could be used to improve 
training sets for computational designs of 
aptamers (40). 
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Body-based units of measure in cultural evolution 


Roope O. Kaaronen?*, Mikael A. Manninen’, Jussi T. Eronen’2 


Measurement systems are important drivers of cultural and technological evolution. However, the 
evolution of measurement is still insufficiently understood. Many early standardized measurement 
systems evolved from body-based units of measure, such as the cubit and fathom, but researchers have 
rarely studied how or why body-based measurement has been used. We documented body-based 

units of measure in 186 cultures, illustrating how body-based measurement is an activity common to 
cultures around the world. Here, we describe the cultural and technological domains these units are used 
in. We argue that body-based units have had, and may still have, advantages over standardized systems, 
such as in the design of ergonomic technologies. This helps explain the persistence of body-based 
measurement centuries after the first standardized measurement systems emerged. 


he ability to measure things is central to 

human cultures. Throughout the history 

of human cultural evolution, systems of 

measurement have been products and 

drivers of cultural complexity (/-4). Global 
industry, technologies, and commerce, as well 
as science itself, are largely built upon inter- 
changeable units of measure. Standardization 
systems, such as the International System of 
Units, permeate the everyday lives of people 
across the globe. Some might say that modern 
times are built upon our ability to measure the 
world. But how does the current system com- 
pare with those from the past, and what role 
has measurement played in the development 
of human societies? 

Worldwide, many early standardized mea- 
surement systems are thought to have evolved 
from body-based units of measure (3, 4). For 
example, one of the earliest-known standard 
measures, the royal cubit of Old Kingdom 
Egypt (around 2700 BCE), evolved from the 
use of the natural cubit (the distance from one’s 
elbow to the tip of the extended middle finger) 
(5). Harappan measurement systems were in- 
fluenced by units such as the fingerbreadth 
(6), and various Ancient Mesopotamian mea- 
surement systems were abstracted from body- 
based units such as the foot, cubit, and pace 
(4). Traditional Chinese (7), Roman (8), Greek 
(3), Aztec (9), and Maya (J0) measurement 
systems also used body-derived standards for 
measurement. 

A unifying feature of past measurement sys- 
tems is the use of individually variable body 
parts as units of measure (/, 3, 4, 11). “Body- 
based units” are here defined as those units 
that are determined by using components of 
the human body. We analyzed the use of body- 
based units of measure in 186 cultures across 
the world, describing common units and the 
cultural domains in which they are used. Body- 
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derived yet standardized units of measure, such 
as the British Imperial foot, are not included in 
our data, even if the etymology of these units 
suggests an earlier use as body-based units. 

Recent work has suggested that the cultural 
evolution of measurement can be character- 
ized as a series of stages, starting from practical 
and gestural comparisons between objects, 
proceeding through unequal comparisons and 
initial standardization, and followed by inter- 
related standardized units that form abstract 
and complex systems of measurement (7). How- 
ever, these are not historical stages that cul- 
tures transition through and leave behind, 
and units of various types may coexist (J). We 
found that a recurrent pattern in historical 
and ethnographic data on measurement is 
that body-based units have persisted alongside 
standardized measurement systems. 

Not all cultures adopted standardized mea- 
surement systems to the same extent, and 
many cultures used body-based units well 
into the 20th and 21st centuries, hundreds to 
thousands of years after the first emergence 
of standardization. In the past, body-based 
measurement systems have often been de- 
scribed as primitive predecessors of stand- 
ardized units (12). We question this notion 
and illustrate how body-based measurement 
systems have offered various problem-solving 
solutions and adaptive advantages in the evolu- 
tion of human cultures and technologies (Fig. 1). 

Drawing on our ethnographic dataset, we dis- 
cuss potential cognitive-cultural causes for the 
long-term persistence of body-based measure- 
ment, documenting mechanisms by which 
body-based units have proven to be successful 
and competitive with standardized systems. 


Results 


We documented body-based units in 186 cul- 
tures (Fig. 2A). Table 1 lists the most common 
units. Variations of the fathom, hand span, and 
cubit are most frequent and exhibit striking 
similarities between cultures around the world 
(Fig. 2, B to D). We also found 62 cases of 
activity-based units of measure. These are units 
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based on bodily activity, such as “a day’s ti, Chee 

: upd 
by foot” (measure of distance), or “a day’s p.—.. 
ing” (measure of area). 

Cultures in our dataset are coded on the 
basis of their inclusion in the Standard Cross- 
Cultural Sample (SCCS) to mitigate Galton’s 
problem (see materials and methods for fur- 
ther discussion). In total, our dataset includes 
evidence of body-based measurement in 99 
SCCS cultures (around 53% of all SCCS cul- 
tures). The SCCS subset allows us to better 
estimate the independent use of specific body- 
based units (Table 1). In the SCCS subset, the 
fathom (44 observations; 23.7% of all SCCS 
cultures), hand span (41 and 22%, respective- 
ly), and the cubit (40 and 21.5%) are the most 
frequent body-based units, suggesting that 
these units appear most commonly in hu- 
man cultures (their frequency might also be a 
product of remarkably distant common orig- 
ins). These estimates are only lower bounds 
because body-based measurement has often 
gone undocumented. 

In Table 2, we present a typology that de- 
scribes the behavioral and cultural domains 
in which body-based units are used. We found 
body-based units especially common in the 
design of technologies, which highlights the 
important role of body-based units in tech- 
nological evolution. We also document note- 
worthy use of body-based measurement in 
trade, agriculture, and rituals. Body-based ‘ 
units are mostly one-dimensional measures 
of length. However, cases of measuring area, 
volume (e.g., handfuls), and temperature are 
also documented. 

Body-based units are found on all inhabited 
continents (Fig. 2A). Our results suggest that 
cultures around the world use very similar 
units (Fig. 2, B to D). Body-based units are 
mostly used in specific contexts, such as the 
measurement of a particular technology. How- + 
ever, our dataset also documents elaborate, ‘ 
domain-general systems of body-based measure- 
ment, such as those used among the Maori, Mara, 
Siwai, Trobriand, Iban, Katu, Kwakwaka’wakw, 
and Chuuk cultures. 

Figure 3 depicts the temporal distribution 
of the evidence of body-based measurement 
per each cultural region in our dataset. We 
found ample evidence for the use of body-based 
units in the 20th century. According to global 
reviews on historical metrology (3, 4), most 
cultural regions had encountered standardized 
units of measure prior to the 20th century. 
Table S1 and Fig. 3 document, for each cultural 
subregion in our dataset, plausible early dates 
for the introduction of standardized measure- 
ment systems. Our dataset supports the general 
claim that body-based measurement systems 
have persisted despite potential access to stan- 
dardization (Fig. 3). 

Definitive claims on culture-specific reten- 
tion of body-based units are difficult to make 
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because the first emergence of standards often 
pre-dates the categorization of contempo- 
rary cultures, and culture-level evidence on 
encounters with standardized measurement 
systems is sometimes lacking. However, we 
surmise that within cultural regions, such 
contact would often occur; therefore, knowl- 
edge of standardization would spread, and in 
many cases cultures could opt to adopt nearby 
standard units if they deemed them necessary 
or superior. 

In certain cases, the retention of body-based 
units is more obvious. For instance, in the 
Middle East, where some of the first known 
standardized measurement systems evolved 
three to five millennia ago (3, 4), body-based 
units have been documented as late as the 21st 
century (Fig. 3). Similarly, in various European 
regions, the first emergence of standards dates 
to the Roman Republic or Hellenistic Greek 
eras or even prehistoric times (73), but body- 
based units are still documented from the 
Middle Ages to the 1900s (table S1 and Fig. 3). 
In an exemplary case, the Zapotec used body- 
based units in the mid-to-late 20th century, 
even though Spanish standards were well-known 
at the time, and standards such as the vara 
(the rod) were introduced centuries earlier (74). 
The Zapotec have even named some of their 
body-based units after Spanish standards (14). 
Our dataset documents similar cases of reten- 
tion in Hawaiian, Turkish, Yup’ik, Palestinian, 
and Mapuche cultures. Moreover, as discussed 
below, body-based measurement is still used in 
some contexts in the industrialized West. 


Discussion 


From the dataset, we identify four cognitive- 
cultural mechanisms that help explain why 
body-based units have been used to begin with, 
and why they were still often preferred to stan- 
dardized units up until the recent past. 


1. Ergonomic design 


Body-based units have the advantage that they 
provide custom-made ergonomic designs in 
ways that standardized systems often overlook 
(Fig. 1). We find references to ergonomic de- 
sign by body-based units of measure in 25 cul- 
tures (Table 2). We take indigenous ergonomics 
to be an especially favorable domain for the 
use of body-based units. Erased by the industrial 
revolution, ergonomics largely reemerged in the 
Western world only after World War II (15). 
Illustrative evidence of ergonomic design is 
found in kayak building. A responsive kayak 
requires proper positioning of the body. Con- 
sequently, no one-size-fits-all design serves all 
kayakers. Kayaking cultures, including the 
Yup’ik (16) and Greenlandic Inuit (J7), have 
used body-based units to correct kayak designs 
for interpersonal variation. Kayaks were typi- 
cally designed “by and for their user for the 
best possible performance” (78, p. 5), to ensure 
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Fig. 1. Examples of objects designed with body-based units. (Top left) Karelian skis, early 1900s. The 
gliding ski was the user’s fathom plus six spans (36). (Top right) Mapuche ponchos were measured 
from the neck to halfway between the waistline and knee and from neck to thumb with arm outstretched 
(26). (Center) Yahi bow, early 1900s. The bow’s length was from the opposite hip joint (X) to the tip 
of the outstretched arm (Y) (37). The width below and above the hand grip was four fingers for a 
powerful bow. (The posture pictured is not a typical Yahi shooting position.) (Bottom) Yupik kayak 
from the Alaskan coast, late 1800s. The kayak’s length was two fathoms (B), plus one half-fathom 

(C), plus the length of the cockpit, which was the length of an arm with a closed fist (D) (19). The kayak’s 
height at the cockpit was one cubit with closed fist (A). The kayak’s width was two cubits. [Images: 
Ski: National Museum of Finland (CC-BY 4.0). Poncho: Wikimedia commons (CC BY-SA 2.0), by Pontificia 
Universidad Catdlica de Chile. Bow: Internet Archive (identifier: yahiarcherysaxtonOOpoperich). Kayak: 
Internet Archive (identifier: eskimoberingstrait0Onelsrich). Human models: MakeHuman. Hand: 
Wikimedia commons (FAL 1.3), by J.N.L.] 


“perfect fit between the kayak and its maker” 
(16, p. 91). Yup’ik kayaks were designed with 
various body measures (J6, 19) (Fig. 1). Similar 
methods are used in the design of paddles: A 
common length for a double-bladed Greenland 
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paddle is the user’s fathom plus one cubit, 
and the blade width is determined by the max- 
imum breadth that one can grip (17). 
Body-based units have also guided the de- 
sign of tools such as skis. For example, a Khanty 
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Table 1. Body parts used for measurement. The fifth column counts the fourth column as a proportion of the total 186 SCCS cultures, describing the percentage 
of all SCCS cultures in which we have documented each body-based unit of measure. Incidence refers to the number of cultures with the specific unit. The parity in 
the number of cultures (186) in our dataset and in the SCCS is coincidental. 


Incidence Incidence 
Unit Description and variations (full dataset) (SCCS subset) = ae 
(N = 186) (N = 99) ee) 
Distance between fingertips of outstretched arms. 
pao cary Spat) Variations include, e.g., the fathom with closed fists. =e a ee 
Sr ea ca pee ae ER CRE ECG DER = ee ae er ere 
Hand span thumb to the tip of one of any four other fingers 81 Al 22.0% 
on an outstretched hand. 
por iadab ea mac nad Oe Ee EEA LCST EEReeEIE oe EEN TN RL en ceseaaT one 
of an extended finger (typically the middle finger). 
Cubit (ell) Also sometimes measured to, e.g., the closed fist or wrist, 76 40 21.5% 
or from elbow crease to fingertips. Other similar 
forearm-based units are included. 
Bae aA RE TMs SAU a ners ee EIR ee ae aera eer 
Arm length typically from tip of outstretched fingers to one of the following: 66 35 18.8% 
armpit, shoulder, or middle of chest (half-fathom). 
rage cg EO UR VE ed fesse Te ee eer 
Activity-based measures such as a “day's journey” or “stone's throw” (linear measures) 63 32 17.2% 
stint 2,"day’s worth of plowing” (Measure Of ate) tne 
Width of one or multiple fingers (or fingernails), 
inger yaicth excluding the thumb (see “thumb width”). i = ae 
Be Net dtre Rent Ae Renan neon tare aa NA Ia RGES CCAS GEER ee eae errata 
Hand width Also includes the width of four fingers or the fist, 3g 16 8.6% 
or the circumference of the palm. 
Bagg nnn pace, step, or slide, ggg ggg 
Length of any of the four fingers, thumb excluded 
Finger length (see “thumb length”). Includes the length of finger 34 18 9.7% 
joints and combinations thereof. 
Cn eee as eee SSSR CAL TORRE CEE ee eee 
head, or to the tip of vertically extended arms. Also includes 
age measures of height to other specified points of the a Bea 
seeeente ste ania ea on eavet te upper body (e.g., navel, eyes, and forehead) anny 
FOOt suntan Inner or outer length of the foot. Also includes foot width, 22 secttoreteeneen dea 2 
Handful Cupped hand (handful) or two cupped hands 26 18 9.7% 
een ee ere ene (double handful), a measure Of VOU nnn 
Thumb width The width of the thumb (including nail width). 15 8 43% 
ee See ee eee Ot SO Oto ee fe With ofthe fst wth an extended thumb Ne eee COREA : BEE rr renee a eRe! es pene 
fee iss ee, een eer NN cSimnllatgtorsthmis Up Cette cg vireo etter sooth te eater, arta td ect elie Ce 
Thumb tenth gan thte length of the thumb or thumb joint(s) ie —— ae, 3.86 
The length of a hand, typically from the wrist joint or 
eee ounce crease to the tip of the middle finger Se oad: cased tena: 
UT SS callie nanan AS thick as the arm (OF WTiSt). a cesnnnnn Tie lteter Ri Settee 27%. 
fea As much as a person can carry in both arms (a measure 7 6 3.2% 
of volume), or the circumference that the arms can surround. 
nn Srl measure for volume measured by pinching the thumb ES nae Nee eee er ear art NRO UN : arena sain Me fete is : i etemees 
seer ems mpc nserver eaves TEASE teal DOE all BEN (GIS 2 gD NCINOU ALLS) smn neers cre mermenes cmmereeee ip cess ses eemerene Ra em 
Leg Fength a cosnsmmnnnne The distance from the sole of the foot to the knee or hip. ee eee ee eerewmee 11% 
; Measure of circumference made by pinching the tip of a finger 
ee eee to the thumb (similar to the "OK" or “ring” gesture). a ee nhttent sake 
Leg thickness As thick as (any part of) the leg. 3 Z 1.1% 


ski maker might measure ski width with their 
outstretched “finger-and-thumb span plus two 
fingers” and ski length from the ground to their 
eyebrows (20, p. 159). This affords ergonomic 
balance: Too narrow skis would sink into soft 
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snow, and too wide skis would be cumbersome 
and carry excessive snow. Sixteenth-century evi- 
dence suggests that the length of Saami skis 
was the height of the user for the kicking ski 
and the user’s height plus the length of their 
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foot for the gliding ski (27) (see also the Karelian 
skis in Fig. 1). In contemporary skiing cultures, 
it is still commonplace to use one’s own height 
to determine ski and pole length. For repeti- 
tive and injury-prone practices such as farming, 
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Fig. 2. Cultures in the dataset on a world map. The maps illustrate the 
widespread practice of body-based measurement. Each diamond represents a 
culture in the dataset. Cultures included in the SCCS are colored blue, other 
cultures are colored red. The map in (A) depicts the distribution of all documented 
cultures with body-based units of measure. The other maps illustrate the three 


ergonomic tool design is especially important 
(1/4). Various Zapotec tools were measured 
with the user’s own body measures, such as 
the Zapotec vara (fathom), to ensure that the 
farmer’s tools (e.g., plows and axes) were custom- 
made and therefore ergonomic (/4). 
Weapons such as bows also require ergo- 
nomic design to ensure proper shooting form 
and draw length. Body-based bow design is 
found in various North American indigenous 
cultures. For instance, Ojibwe bow length 
varied with the stature of the bow’s owner, 
measuring from “the point of the shoulder 
across the chest to the end of the middle 
finger of the opposite hand” (22, p. 146) (see 
also Yahi bow design in Fig. 1). Similar design 
is found in Europe, as documented in Edward 
IV’s orders for every Englishman between 16 
and 60 years of age to construct a longbow of 
their own height plus one fistmele (width of 
fist with thumb extended) (23). Even today, 
body-based units are being used in archery and 
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bowhunting. A description of Yup’ik spear 
throwing highlights the importance of an ergo- 
nomically sized weapon (16, p. 138): 


They can use one yagneg (arm span) 
to measure when they make a nanerpak 
[seal spear]. People use their own 
body measurements. If a person uses a 
spear of someone taller than he is, it will 
be too long for him, and he will throw 
it differently. But when they make it to 
size, it can hit the target when thrown. 


Custom-tailored clothes and footwear are 
also made using body-based units. An illustra- 
tive example is the case of Mapuche poncho 
design (Fig. 1). Today, even tailors in commer- 
cial economies use body measures to ensure 
the custom fit of garments. 

These findings suggest that body-based units 
and indigenous ergonomics have played an 
important, typically overlooked role in the 
design and evolution of technologies world- 
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most common body-based units in the dataset: the fathom (B), cubit (C), 
and hand span (D). The locations of cultures are based mostly on eHRAF 
(Human Relations Area Files) coordinates. Locations are only rough estimates 
because many cultures and ethnic groups are geographically widespread 


wide. Not unlike cultures of today, cultures in 
the past have struggled with ailments caused 
by repetitive and intensive activities (24), and 
reducing strain through functional design 
would have been essential for them. 


2. Motor efficiency 


Body-based units afford convenient motor rou- 
tines. For example, measuring slack items such 
as fishing nets or rope with standard rulers is 
impractical because they must be outstretched 
for each partial measurement and can be in- 
conveniently long. By contrast, manual mea- 
surement can be conducted with relative ease, 
using simple motoric procedures. Consider, for 
example, Samoan methods of measuring three- 
ply braid (25, p. 240): 


[T]he worker measures the braid 
by holding one end with the left 
hand and running it through the 
right as he stretches the arms to full 
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Fig. 3. Timeline of standard- 
ization and recorded body- 
based measurement. For each 
cultural region in our dataset 
(based on the HRAF regional 
categories), we defined the 
earliest-known case of stan- 
dardized units of measure (blue 
points), the coverage dates of 
ethnographic evidence for 
body-based measurement (red 
segments; darker segments 
indicate greater amounts 

of evidence), and the most 
recent evidence for body- 
based measurement 

(red points). These dates 

are defined and described 

in more detail in 

table S1. 


Cultural region 


Amazon and Orinoco 


Eastern South America 
Northwestern South America 
Southern South America 


length. The full arm span is called a 
ngajfa. The right hand holds the farthest 
point of the first span and draws it into 
the left hand which seizes the point. 
The second span is run through and so 
on until the number of spans or ngafa 
are counted. 


Our dataset documents similar techniques 
of using fathoms to measure nets and ropes 
around the world. The influence of such prac- 
tices is still observable today: A standardized 
fathom is used for measuring water depth in 
the British Imperial system. A likely explana- 
tion for these similarities across cultures is 
the procedural ease by which the fathom suits 
the measuring of slack items. 


3. Availability 


Body-based units have the advantage that their 
use does not require additional, and often 
cumbersome, measurement tools. This pro- 
vides access to easy measurement even for 
highly mobile populations. Availability is useful 
even in contexts where standardized measures 
exist. For example, as one Mapuche informant 
describes (26, p. 93): 


But I do not always have a meter measure 
handy; I know that my wima [the 
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length from Adam’s apple to tip of 
fingers of an outstretched arm] is 
nearly a meter and I use it. 


4, Integration with local knowledge 


The use of body-based units, unlike that of stan- 
dardized units, is often restricted to specific prac- 
tical tasks. Accordingly, body-based units often 
account for local information in ways that 
standardized units overlook. This is especially 
the case with activity-based units of measure. 

For example, the Nicobarese have conveyed 
canoe trip distances as quantities of young co- 
conut drinks consumed (27). Hydration is an 
especially important factor in the salt water of 
the Indian Ocean, and it would make practical 
sense to measure journey distances with re- 
quired hydration units. In addition, stand- 
ardized units of length such as nautical miles 
would not solely account for local variation in 
currents, weather, and wind conditions, which 
can all affect physical effort and travel time (and 
therefore, the amount of hydration required). 

Vernacular units may be more sensitive to 
local conditions, conveying relevant informa- 
tion that standardized measures disregard. 
For instance, the Ifugao have used the number 
of rests required as a measure of distance, 
which is reasonable given that the local moun- 
tainous terrain is highly variable, rendering 
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standard linear measures less useful (28). Sim- 
ilarly, in some cultures, land is measured in 
terms of physical activity, such as a day’s 
worth of plowing, which also naturally ad- 
justs for variabilities in terrain quality (29). 
Such measurement units allow adaptation 
to practical local context in deliberate ways. 


These findings align with research suggest- + 
ing that context-specific counting systems can ‘ 


have cognitive and practical advantages (30). 

Lastly, standardized units of distance may 
simply not be very useful in everyday local 
lifeways. Local societies typically know their 
surroundings very well, so there would be 
little need for them to measure distances be- 
tween these points. For example, distances on 
the Ifaluk Atoll are so short and universally 
known by locals that “there is little need to 
discuss them” (31, p. 20). 


From rules of thumb to standardization 


Our data show that body-based units were still 
used worldwide in the 20th century, close to 
five millennia after the emergence of the first 
known standardized units. Our analyses sug- 
gest that considerable time lags existed be- 
tween the regional emergence of standardized 
units and the use of body-based units (Fig. 3). 
This may be the result of practical advantages, 
such as ergonomics and availability. 
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Table 2. Behavioral and cultural domains in which body-based units of measure are used. The third column describes the incidence of the trait in the full 
dataset (the number of cultures that the trait appears in). The fourth column describes the number of SCCS cultures in the dataset that are recorded with each trait. 


Theme 


Description 


Technological domains 


Body-based units are used in the design, measurement, 
or weaving of garments or cloth. Includes textiles, 
clothes, footwear, and other wearable items. 


Body-based units are used in the design or construction 
of buildings or other infrastructure. Includes carpentry. 


Body-based units are used in the design or 
construction of weapons (e.g., bows and spears). 


Body-based units are used in the design or construction of 
transport-related technologies (e.g., kayaks, canoes, 
boats, skis, equestrian items, and sleds). 


Body-based units are used in the design or construction 
of other household items, such as mats, pottery, utensils, and looms. 


Body-based units are used in the context of fishing 
(also, e.g., crabbing, shellfish, and harvesting), such as the 
measurement of fishing nets, lines, hooks, and harpoons. 


Body-based units are used in the design or construction of 
agricultural technologies, such as scythes or plows. 


Body-based units are used in the design or construction of 
musical instruments. 


Body-based units are used for trade, in markets and barter, 
or for measuring units of currency. 


Body-based units are used in agriculture (or horticulture), 
e.g., in measuring cultivated land or agricultural products, 
or distance between sowed seeds. 


Body-based units are used in ritual, ceremonial, religious, burial, 
or divination purposes. 


Body-based units are used to measure the size (or value) of 
livestock and other animals. 


Body-based units measure linear distance (one-dimensional; 
between two points). 


Instances where body-based units of measure are mentioned to be used 
in designing custom-sized (ergonomic) technologies. 


Temperature 


Instances where the body is used to measure temperature 
(e.g., when something is “too hot to touch” or of “body temperature”). 


Another potential (not mutually exclusive) 
explanation for the persistent use of body-based 
units is cultural inertia. Cultural innovations 
are often slow to spread, and new formal in- 
novations that require auxiliary technologies 
and standardization are often delayed in their 
cultural diffusion. This traction is well docu- 
mented in histories of measurement (2, 32). 

We suggest that pressures for standardiza- 
tion grow mainly in large-scale societies and 
particularly in intercultural states and com- 
merce. We therefore raise the possibility that 


Kaaronen et al., Science 380, 948-954 (2023) 


the transition from body-based units to stan- 
dardized ones often spread as a case of “seeing 
like a state” (33) and not only for practical pur- 
poses: Standardized measurement systems were 
cognitive-cultural inventions that enabled seam- 
less statecraft. The early use of standardized 
units typically revolves around governance 
and administration (32), whereas body-based 
units are more often used by manual workers 
and artisans (1/4, 16). Statecraft-related activ- 
ities such as intercultural commerce, regu- 
lation, and taxation would have demanded 
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standardization and divisibility in ways that 
body-based units of measure could not deliver. 
This would also explain why standardized units 
primarily emerge through the influence of em- 
pires and large states (table S1). 

Idiosyncratic rules of thumb could not co- 
exist with the demands of mass production. 
This is evident in industrialist Taylorist princi- 
ples, which were antagonistic toward “inefficient 
rule-of-thumb methods” (34, p. 16). Even if body- 
based measurements could serve manual work- 
ers, they could not be adapted to the strict 


6 of 7 


RESEARCH | RESEARCH ARTICLE 


requirements of factory workflows. The move | 17. 
from body-based measurement systems to stand- : 
ardized and abstract systems therefore reflects 
a larger break in human cultural evolution, 
one that has seen production systems evolve | 19. 
from local and heterogeneous to global and 
homogenous. As a consequence, traditional 
units of measure are endangered in the broader | 20. 
cultural extinction event (35) that has followed a 
globalization, industrialization, and colonization. 
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Tracking C-H activation with orbital resolution 
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Transition metal reactivity toward carbon—hydrogen (C-H) bonds hinges on the interplay of electron 
donation and withdrawal at the metal center. Manipulating this reactivity in a controlled way is 
difficult because the hypothesized metal-alkane charge-transfer interactions are challenging to access 
experimentally. Using time-resolved x-ray spectroscopy, we track the charge-transfer interactions 
during C-H activation of octane by a cyclopentadienyl rhodium carbonyl complex. Changes in 
oxidation state as well as valence-orbital energies and character emerge in the data on a femtosecond 
to nanosecond timescale. The x-ray spectroscopic signatures reflect how alkane-to-metal donation 
determines metal-alkane complex stability and how metal-to-alkane back-donation facilitates C-H 
bond cleavage by oxidative addition. The ability to dissect charge-transfer interactions on an orbital 
level provides opportunities for manipulating C—H reactivity at transition metals. 


he transformation of saturated hydro- 
carbons under mild conditions into more 
valuable products constitutes a long- 
standing challenge in chemistry (/-4). 
Photoinitiated reactions of transition 
metal carbonyl complexes with alkanes have 
long served as fruitful model systems (5, 6), 
providing detailed insights into the cleavage 
mechanism of strong C-H bonds at a metal 
center (J, 2, 4, 7). In these systems, photo- 
induced ligand loss is known to create a highly 
reactive species with an undercoordinated 
and electron-deficient metal center (Fig. 1A). 
The metal then rapidly binds an alkane from 
solution to form a o-complex, in which the 
metal coordinates to one or more C-H o-bonds. 
Ultimately, metal insertion between C and 
H atoms breaks the C-H bond to form a metal 
alkyl hydride product. The o-complex inter- 
mediates have been extensively studied over the 
past several decades to probe their molecu- 
lar structure and mechanistic role (8-20). 
Quantum chemical calculations, in par- 
ticular, suggest that the metal-alkane bond in 
o-complexes is formed by donation of electron 
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density from the occupied C-H o-orbital into 
unoccupied metal d-orbitals concomitant with 
back-donation from occupied metal d-orbitals 
into the unoccupied antibonding C-H o*-orbital 
(21-24) (similar to, albeit substantially weaker 
than, metal-carbonyl bonds, as illustrated in 
Fig. 1B). Both types of interactions simulta- 
neously enhance metal-alkane bonding and 
weaken the alkane C-H bond. Because it is 
the balance of back-and-forth charge-transfer 
via different orbitals that determines whether 
a o-complex ultimately proceeds to C-H bond 
cleavage, dissecting individual charge-transfer 
interactions could provide orbital-based design 
principles as a guide for catalyst development. 
Experimentally, time-resolved infrared (IR) 
spectroscopy has been instrumental in iden- 
tifying reaction intermediates in C-H activa- 
tion (17) by probing shifts in infrared marker 
modes of spectator ligands. Such shifts are the 
result of changes in spectator-ligand bond 
strengths induced by changes in the integrated 
charge-transfer interactions in the complex. Sepa- 
rately accessing donation and back-donation 
to and from the metal in a o-complex, however, 
would be a way to experimentally correlate 
individual orbital interactions with reactivity 
toward C-H bond cleavage (7). 

In this work, we demonstrate a distinct way 
to experimentally evaluate metal-ligand charge- 


transfer interactions during C-H activatio ches 
metal complexes. Using time-resolved x, 

absorption spectroscopy (XAS) at the metal 
L-edge (11, 25-30), we probe the short-lived 
reaction intermediates from the vantage point 
of the reactive metal site to interrogate the 
decisive charge-transfer interactions that deter- 
mine the overall reaction. In two ultraviolet 
(UV)-pump and x-ray-probe experiments at 
the Swiss Free Electron Laser facility (SwissFEL) 
and the Swiss Light Source synchrotron radi- 
ation facility (SLS), we track o-complex forma- 
tion and oxidative addition using CpRh(CO), 
(where Cp is cyclopentadienyl) (10, 18-20) in 
octane solution. The time-resolved Rh L-edge 
absorption spectra were recorded by collecting 
the x-ray fluorescence as a function of incident 
x-ray photon energy around the Rh Lz absorption 
edge (Fig. 1C; see supplementary materials for 
experimental details). As is the case for other 
4d transition metal complexes (27, 31, 32), the 
Rh L;-edge transitions can be assigned to ex- 
citations of Rh 2p core electrons to unoccupied 
molecular orbitals (see Fig. 1D). Changes in 
transition energies reflect changes in orbital 
energies, whereas oscillator strengths vary 
with the degree to which Rh 4d and ligand 
orbitals hybridize. In combination with our 
calculations, the data provide direct access to 
back-and-forth charge-transfer interactions 
along the C-H activation reaction trajectory 
at the level of individual orbitals. ‘ 


Time-resolved XAS of C-H activation 


The steady-state Rh Lz-edge absorption spec- 
trum of CpRh(CO), shown in Fig. 1D exhibits a 
peak at a photon energy of ~3006 eV that 
results from excitation of Rh 2p core elec- 
trons into the lowest unoccupied molecular 
orbital (LUMO), the empty 4.d-derived orbital 
of the Rh(1) d® ground state configuration. 
The second peak at ~3007.5 eV is assigned to < 
transitions of Rh 2p electrons into unoccupied ‘ 
orbitals of mainly CO and/or Cp ligand char- 
acter. Through metal-ligand back-donation, 
these ligand-derived orbitals acquire Rh 4d 
character and become accessible by the Rh 
2p—d dipole transitions in L3-edge XAS (33). 
Upon laser excitation, as seen in the dif- 
ference spectrum recorded at a pump-probe 
time delay of 250 fs, a pre-edge peak appears at 
~3002.5 eV together with substantial bleaching 


————————————————————————————————— SSeS 
Table 1. Mulliken charge and orbital properties of the CpRh(CO)-octane and Rh(acac)(CO)- 


octane o-complexes (B3LYP level of theory). 


LUMO character 
o-complex Rh Mulliken charge=_—@£ ——MH_—_—_!"_—_—TTTTT_ 
Rh 4d (%) Cp or acac (%) CO(%) Octane (%) 
CPRA(CO)-octaNe un OE ee cereonet Se moatiatuee Eo © serene a eter 6.2 
Rh(acac)(CO)-octane 0.46 522 152 ies 2 
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of main-edge features (Fig. 1D). The temporal 
evolution of the pre-edge peak intensity, shown 
as a time trace in Fig. 1E, is well described by a 
biexponential decay to a metastable species 
(reduced ? = 1.11; see supplementary mate- 
rials for a kinetic model). The two time con- 
stants are assigned to CO dissociation from 
excited states of CpRh(CO), within 370 + 50 fs 
followed by octane association within 2.0 + 


Rh “O” ~=Rh 
CO 


250 fs 


CpRh(CO)2 in octane 


liquid jet 


X-ray Laser 


(FEL, Synchrotron) (266 nm) 


—— LUMO 1, +2... (Lig.) 
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— 10 ps (FEL) 


> 190 ns 
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Steady-state (x0.08) 


2995 3000 


+octane 
——_—__ 


3005 


0.1 ps. These assignments agree with the time- 
scales for ligand substitution in other metal 
carbonyls from previous femtosecond mea- 
surements (28, 34). Our experiment establishes 
the timescale of formation of the CpRh(CO)- 
octane o-complex, and the spectrum at 10 ps 
in Fig. 1D constitutes a direct fingerprint of 
how metal-ligand charge-transfer interactions 
change upon substituting a CO with an alkane 


ct bond EF 


| cleavage | 
Rh =~ _Rh., 
oo” “s co 
CG Cc 


10 ps > 190 ns 


Detector 


Rel. abs. change 


Rel. abs. change 


3010 3015 


Photon energy (eV) 


Fig. 1. Mechanistic model and time-resolved XAS of C-H activation 

by CpRh(CO)z in octane solution. (A) Schematic of C—-H activation by 
CpRh(CO)> via photoextrusion of CO followed by alkane complexation and 
oxidative addition. hv, UV photon. (B) Orbital-specific metal-ligand charge- 
transfer interactions for metal-alkane and metal-carbonyl bonds. (C) Schematic of the 
experiment with UV-laser pump pulses triggering the reaction and x-ray pulses 
probing orbital evolution as a function of time delay between pump and probe pulses. 
Reaction intermediates and products [as well as ground-state CpRh(CO).] are 
characterized by detecting the Rh fluorescence as a measure of the Rh-specific 
x-ray absorption (see supplementary materials). (D) Steady-state and transient 
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Donation 


Donation 


ligand. On nanosecond timescales, the disap- 
pearance of the o-complex pre-edge peak (time 
trace at 3004.4 eV in Fig. 1F) and the simul- 
taneous emergence of a positive absorption 
feature (time trace at 3006.6 eV in Fig. 1F 
and transient spectrum at nanosecond delay 
times in Fig. 1D) reflect how the metal-ligand 
charge-transfer interactions further change 
upon C-H activation by oxidative addition. 


Metal-alkane bonding 
; ot «@ 
y 


Back-donation 


Metal-carbonyl bonding 


Back-donation 


FEL 


{ 3002.8 eV 


k 3004.4 eV 
¢ 3006.6 eV 


0 > 16 IS 20 25 
Delay (ns) 


Rh L3-edge absorption spectra at indicated pump-probe time delays as well as a 
schematic depiction of the L-edge absorption process [difference spectra are 
plotted relative to the edge-jump of the steady-state spectrum (intensity at 
3015 eV), which is normalized to 1; steady-state and difference spectrum at 
delays >190 ns are scaled for illustration]. Rel. abs., relative absorption. 

(E and F) Time traces (intensities versus time delay) measured at indicated 
x-ray photon energies with (E) femtosecond and (F) picosecond time resolution. 
In (E), the gray, orange, and purple shaded regions represent the relative 
populations of the CoRh(CO)s excited state, the CpRh(CO) fragment, and the 
CpRh(CO)-octane o-complex, respectively. 
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Both time traces are modeled with a single 
exponential (reduced y* = 1.01), yielding a 
time constant of 14 + 2 ns, which is in ex- 
cellent agreement with the ~14 ns for C-H 
activation of octane with CpRh(CO), from 
time-resolved IR measurements (18). 
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Comparison with theory 

This assignment of the transient x-ray absorp- 
tion spectra is further validated and detailed 
by the calculated spectra in Fig. 2A. Because 
shapes and intensities of the measured spectra 
are well reproduced, we can robustly assign 
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x-ray transitions to underlying charge-transfer 
interactions (see supplementary materials for 
computational details and discussion of devia- 
tions between experiment and theory). We use 
the experiment-theory comparison to extract 
the orbital correlation diagram shown in Fig. 2B. 


Fig. 2. X-ray absorption signatures 
of o-formation and C-H activation 
by oxidative addition. (A) Experi- 
mental spectra at time t = 10 ps and 
>190 ns (top) compared with 
calculated spectra of CpRh(CO)- 
octane and CpRh(CO)-H-R [middle, 
calculated on the B3LYP level of 
theory (43)]. Ls-edge transitions and 
spectra calculated for intermediate 
structures (bottom) illustrate the 
interconversion of spectral features 
from reactant to product along 

the C-H activation reaction coordinate 
(39). Calculated difference spectra 
are scaled such that the CpRh(CO)- 
octane difference spectrum matches 
the pre-edge intensity of the experi- 
mental spectrum at 10 ps. Vertical 
lines indicate positions of spectral 
fingerprints a, b, and c. (B) Correla- 
tion diagram between the valence 
orbitals of CpRh(CO)2, CpRh(CO)- 
octane, and CpRh(CO)-H-R detailing 
the interconversion of orbital 
energies and character upon ligand 
substitution and C-H activation. 


the antibonding counterpart of the 
bonding interactions that are sche- 
matically shown in Fig. 1B. For 
illustration, calculated orbitals are 
displayed with varying isovalues 
(see supplementary materials). 
(C) Calculated free energies (top), 
Rh 4d character of LUMO+1 and 
LUMO+3 orbitals (middle), and 


LUMO+1 and LUMO+3 orbitals 
(bottom) as a function of reaction 
coordinate of oxidative addition. 
arb. u., arbitrary units. 
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Importantly, this correlation diagram, which 
is based on robust experimental observations, 
relates orbital interactions from ligand substi- 
tution and o-complex formation to C-H bond 
breaking and oxidative addition. 

Two major effects in metal-ligand bonding 
of the CpRh(CO)-octane o-complex compared 
with CpRh(CO), are reflected in the 10-ps 
transient spectrum. First, substituting the 
strong-field CO ligand with the weakly inter- 
acting octane stabilizes (decreases the ener- 
gy of) the Rh 4d-derived LUMO orbital (Fig. 
2B). This is directly reflected in a decrease of 
2p—LUMO transition energies: The pre-edge 
peak that is due to 2p—-LUMO transitions 
is shifted to lower energy in the o-complex 
(3004.2 eV) compared with CpRh(CO), (3006 eV; 
Fig. 2A). Second, an overall reduced degree 
of back-donation in the o-complex compared 
with CpRh(CO), lowers the hybridization of 
ligand orbitals with Rh 4d orbitals. This dimi- 
nishes intensities of the Rh 2p transitions to 
ligand-derived orbitals in the o-complex and 
causes the depletion in the main-edge region 
of 3006 to 3009 eV (Fig. 2A). 

For the subsequent C-H bond breaking and 
oxidative addition step from the o-complex 
to the metal alkyl hydride, calculations of 
the free-energy landscape shown in the top 
panel of Fig. 2C suggest a barrier of ~7 kcal/mol 
and an exothermic reaction. The underlying 
reaction coordinate is constructed from a 
Nudged-Elastic-Band (NEB)/TPPSh/Def2-TZVP 
computation (35-37). Using the geometries of 
this reaction path scan, the free-energy land- 
scape was computed at the DLPNO-CCSD(T)/ 
Def2-TZVP level of theory (38). Although we 
do not experimentally observe the intermediate 
structures along the reaction coordinate, the 
L3-edge x-ray absorption spectra computed for 
these structures relate the spectral changes 
from the o-complex reactant to the metal alkyl 
hydride product, which we observe. The key 
spectral fingerprint regions (denoted as a, J, 
and c in Fig. 2A) can be assigned to excitations 
of Rh 2p electrons predominantly into the LUMO, 
LUMO+1, LUMO+2, and LUMO+3 orbitals 
(LUMO+4, +5, ... are not discussed because 
they contribute to a negligible degree only). 
Changes of the features a to c hence report 
on the combined transformations of the four 
lowest unoccupied orbitals upon C-H bond 
breaking and oxidative addition. As detailed 
in the following paragraphs, feature a re- 
flects changes in metal-alkane orbital over- 
lap as metal-alkane bond distances change, 
feature b reports on the oxidation of the metal, 
and feature c reflects changes in metal-ligand 
back-donation. 

In line with previous work (39), our cal- 
culated reaction coordinate describes the C-H 
bond moving toward the Rh center and, at the 
same time, the C-H bond elongating and 
breaking until the individual Rh-C and Rh-H 
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Fig. 3. X-ray orbital view of reactivity modulations in C-H activation by varying the ligand environ- 
ments in o-complexes. (A) Schematic of the LUMO orbitals of CpoRh(CO)-octane and Rh(acac)(CO)-octane 
with variations in metal-ligand bonding (charge-transfer indicated by arrows) and their effect on the 
affinity for oxidative addition (calculated free energies). Me, methyl. (B) Calculated difference spectra of 
CpRh(CO)-octane and Rh(acac)(CO)-octane compared with transient difference L3-edge absorption spectra 
measured at SwissFEL at a pump-probe delay time of 10 ps. For comparison, the experimental Rh(acac)(CO)- 
octane spectrum is scaled to match the depletion of the CpRh(CO)-octane. This scaling is validated by 

the excellent agreement with the calculated spectra, which are shown with the same scaling as in Fig. 2A 


(see supplementary materials). 


bonds are established (see schematic of the 
reaction coordinate in Fig. 2A, bottom). As a 
consequence of these atomic rearrangements 
along the reaction coordinate, the Rh 4d-derived 
LUMO orbital shifts to higher energies be- 
cause of increasing orbital overlap with the 
approaching C-H group with minor changes in 
hybridization (Fig. 2B and fig. S7). The cor- 
responding increase of 2p>—>LUMO transition 
energies is directly observed experimentally 
by the disappearance of the pre-edge feature 
a in the spectrum upon transformation of the 
o-complex to the alkyl hydride product: The 
2p—LUMO transitions shift to higher energy 
and merge with the main edge of the spec- 
trum, thereby contributing to the generation 
of feature b in the spectrum of the CpRh(CO)- 
H-R alkyl hydride product. 

LUMO+1 in the o-complex is the energeti- 
cally lowest ligand-derived orbital with domi- 
nant CO n* character and with some Rh 4d 
admixture due to Rh-CO back-donation (see 
orbital plots in Fig. 2B). Upon oxidative ad- 
dition, LUMO+1 shifts to slightly lower energy 
and, importantly, gains considerable Rh 4d 
character (see calculated Rh 4d character in 
Fig. 2C). The increase is so substantial that 
the Rh 4d character becomes the dominating 
contribution. This can be interpreted as an 
effective transformation of the former ligand- 
derived orbital into a second unoccupied Rh 
4d-derived orbital (in addition to the LUMO; 
see orbital plots in Fig. 2B). The increase of 
Rh 4d character directly scales with an increase 
of oscillator strength of the Rh 2p>—LUMO+1 
transitions (Fig. 2C). Feature b hence emerges 
as a Strong peak, drawing intensity from both 


the LUMO+1 transforming into a second un- 
occupied Rh 4d-derived orbital and the LUMO 
shifting to higher energy and merging with the 
main edge (Fig. 2A). The emergence of feature 
b thus reflects the combined electronic-structure 
effects of C-H bond cleavage (LUMO destabi- 
lization) and oxidative addition (LUMO+1 
transformation). In particular, the oxidation of 
the metal center from a Rh(D (d°) to a RhdID 
(d°) configuration is evidenced by the emer- 
gence of two unoccupied Rh 4d orbitals. 
The increase of the Rh oxidation state sub- 
stantially destabilizes LUMO+2 (Fig. 2B) and 
slightly reduces its Rh 4d character (see fig. 
S7). As the second CO n* orbital, its destab- 
ilization thus directly reflects a decrease in 
back-donation from the oxidized metal onto 
CO x*. This effect has also been associated 
with the shift of CO marker modes to higher 
energy upon C-H activation (17), consistent 
with our results. Finally, LUMO+3 in the 
o-complex constitutes the octane C-H o* orbi- 
tal, which exhibits weak Rh 4d admixture be- 
cause of low back-donation from Rh to C-H 
(orbital plot in Fig. 2B). Back-donation, how- 
ever, increases as the C-H bond is broken and 
the covalent Rh-C and Rh-H bonds are formed, 
as evidenced by the substantial increase in Rh 
4d character in LUMO+3 (see Rh 4d character 
in Fig. 2C and orbital plots in Fig. 2B). Ac- 
cordingly, the oscillator strengths of Rh 
2p—LUMO+3 transitions also strongly increase 
upon oxidative addition (Fig. 2C). Together 
with the transitions into LUMO+2 shifting 
toward higher energies, this causes the for- 
mation of the strong peak c in the alkyl hydride 
spectrum, which exhibits an intensity similar 
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to the steady-state spectrum of CpRh(CO), 
and one considerably stronger than that in the 
o-complex (Fig. 2A). Experimentally, this is 
reflected in negligible intensities in the alkyl 
hydride difference spectrum at the energies of 
feature c compared with the strong bleaching 
in the transient o-complex spectrum. 


A more stable o-complex 


By experimentally observing individual charge- 
transfer interactions, we verify the validity of 
orbital correlation diagrams along C-H activation 
reactions that were previously derived from 
quantum chemical calculations alone (40). 
Our approach allows us, in particular, to ex- 
pand upon established notions of charge- 
transfer interactions between the metal and 
the C-H group by experimentally assessing 
the critical role of additional orbital interac- 
tions between the metal and the spectator 
ligands. We further demonstrate this here by 
evaluating how different o-complexes exhibit 
different reactivities toward C-H activation 
owing to specific differences in orbital inter- 
actions as a result of their different ligand 
environments. It has previously been shown 
that replacing the Cp moiety with an acetyl- 
acetonate (acac) group leads to a stable 
o-complex, which, however, does not proceed 
to oxidative addition of the C-H bond (41). 
Our calculations shown in Fig. 3A suggest a 
4.2 kcal/mol stabilization of the Rh(acac) 
(CO)-octane with respect to the CpRhCO- 
octane o-complex. Together with the endo- 
thermic free-energy profile we calculated, 
this renders the C-H activated product un- 
favorable. We find the extra stabilization of 
Rh(acac)(CO)-octane to be predominantly due 
to a higher donation from the octane onto the 
Rh center. This stronger donation is favored by 
the higher charge deficiency at the Rh in the 
case of the more ionic bond between Rh and 
the acac group compared with the bond 
between Rh and Cp (see the Mulliken charges 
in Table 1). 

Our calculations predict this variation in 
ionicity and the related variation in reactivity for 
C-H activation to manifest in the x-ray absorp- 
tion difference spectra of the two o-complexes as 
shown in Fig. 3B. Our experiment directly 
confirms this prediction. In quantitative agree- 
ment with theory, the measured spectrum 
of Rh(acac)(CO)-octane shows a higher pre-edge 
intensity than CpRh(CO)-octane (Fig. 3B). We 
find this difference to be due to a higher Rh 4d 
character in the LUMO (at the expense of a 
lower hybridization with the acac group; see 
Table 1), which causes the more intense 
2p—LUMO pre-edge transitions in Rh(acac) 
(CO)-octane. This higher Rh 4d character, 
which directly correlates with higher Rh 
ionicity, renders the reaction step to the 
Rh(acac)(CO)-H-R species endothermic. A 
more charge-deficient Rh(I) center—in the 
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case of acac, having a lower propensity to be 
further oxidized to Rh(III)—is consistent with 
and extends established trends in alkane oxi- 
dative addition (7). We thus establish a direct 
measure of how the lower hybridization of Rh 
4d with spectator-ligand orbitals in the more 
ionic bond to the acac ligands modulates 
reactivity for C-H activation by unfavorably 
changing the balance of charge-transfer inter- 
actions that bind (alkane-to-metal o-donation) 
versus those that break the C-H bond (pro- 
pensity for oxidation via metal-to-alkane 
back-donation). 

Our results demonstrate the value of time- 
resolved, metal-specific, L-edge x-ray absorp- 
tion spectroscopy for understanding, on an 
orbital level, which factors determine reactivity 
for C-H activation at a metal complex. We 
anticipate that our approach will be used in 
the future to systematically screen o-complexes 
and alkyl hydride reaction products to provide 
a distribution of valence orbital energies and 
character as measures of metal-alkane bond 
stability and propensity toward C-H activation 
with oxidative addition and, potentially, other 
mechanisms (40). With C-H activation ranging 
from nucleophilic to electrophilic, depending 
on the relative weight of charge donation and 
back-donation, the here established experi- 
mental observables can be used to ascertain 
where in the range of mechanisms a probed 
system lies. Such insight can then be used to 
pin the results from computational studies that 
correlate valence electronic structure with 
reactivity. We envision this approach to extend 
established trends for reactivity (7) by pro- 
viding experimentally verified correlations 
between metal-ligand charge-transfer inter- 
actions and reactivity for orbital-level control 
of C-H activation. 
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3D PRINTING 


A sinterless, low-temperature route to 3D print 
nanoscale optical-grade glass 


J. Bauer’2*, C. Crook?, T. Baldacchini? 


Three-dimensional (3D) printing of silica glass is dominated by techniques that rely on traditional 
particle sintering. At the nanoscale, this limits their adoption within microsystem technology, which 
prevents technological breakthroughs. We introduce the sinterless, two-photon polymerization 3D 
printing of free-form fused silica nanostructures from a polyhedral oligomeric silsesquioxane (POSS) 
resin. Contrary to particle-loaded sacrificial binders, our POSS resin itself constitutes a continuous 
silicon-oxygen molecular network that forms transparent fused silica at only 650°C. This temperature is 
500°C lower than the sintering temperatures for fusing discrete silica particles to a continuum, which 
brings silica 3D printing below the melting points of essential microsystem materials. Simultaneously, 
we achieve a fourfold resolution enhancement, which enables visible light nanophotonics. By demonstrating 
excellent optical quality, mechanical resilience, ease of processing, and coverable size scale, our material 
sets a benchmark for micro— and nano-3D printing of inorganic solids. 


he three-dimensional (3D) free-form 

manufacturing of silica glass is dominated 

by techniques that rely on particle-loaded 

binders and sintering (J-4). However, 

these impose several limitations restrict- 
ing their adoption within microsystem tech- 
nology, which prevents major technological 
breakthroughs. 

Silica glass has a softening point of 1100°C, 
which makes it historically challenging to struc- 
ture. However, its superior optical transpar- 
ency and thermal, chemical, and mechanical 
resilience make it one of the most important 
materials for modern engineering applications, 
which include micro-optics (5, 6), photonics 
(7-9), microelectromechanical systems (MEMS) 
(0, 11), and microfluidics and biomedicine 
(12, 13). Established microsystems synthesis 
routes (/4) manufacture silica structures by 
means of elaborate top-down process sequen- 
ces, which involve techniques such as 2D mask 
lithography, thermal oxidation, vapor deposi- 
tion, and etching, but these processes hardly 
translate to 3D designs. Recently, the free-form 
manufacturing of silica glass has greatly ad- 
vanced. However, the most advanced 3D print- 
ing and molding methods (12) still rely on 
melting or particle-sintering steps identical 
to ancient blowing techniques and established 
industrial processes. 

Nearly unconstrained 3D design freedom at 
nanometer resolution grants two-photon poly- 
merization (TPP) 3D printing (15) the potential to 
radically transform microsystem technology, 
which today is largely constrained to planar 
structures. However, TPP printing is based 
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on the laser exposure of photosensitive mate- 
rials, which are most commonly polymers with 
intrinsically variable optical (J6) and mechan- 
ical properties (17) and limited environmental 
stability. TPP facilitates the in situ 3D printing 
of complexly shaped polymeric free-form micro- 
and nanostructures (J8-20) directly on micro- 
chips. If the same could be achieved with robust 
silica glass instead of polymer, the technique 
could realize major breakthroughs within 
optoelectrical systems such as superior imag- 
ing devices (18, 21), optical MEMS (10, 17), and 
nanophotonic integrated circuits (19, 20), such as 
for the development of quantum computers (22). 

Recently, the TPP printing of silica glass has 
been demonstrated (2, 3); however, these ap- 
proaches are still based on particle-loaded 
sacrificial polymer binders with limited appli- 
cability. To remove the binder and fuse the 
silica particles into solid structures, several 
day-long sintering procedures under vacuum 
or inert atmosphere at 1100° to 1300°C are 
required. These temperatures lie above the 
melting points of many important engineering 
semiconductors, such as germanium, cadmium 
telluride, and indium phosphide, which are 
some of the most efficient materials for solar 
cells, infrared and fiber optics, lasers, and photo- 
detectors. The same applies to most metals 
used in electrical circuits. Thus, traditional 
particle-based silica glass resins are generally 
not capable of on-chip manufacturing. The 
only alternative, postprint assembly of micro- 
scale components, involves a multitude of chal- 
lenges (19) and can hardly compete with 
state-of-the-art assembly routes (J4) that use 
orders of magnitude higher throughput with 2D 
and 2.5D techniques. In addition, particle-based 
TPP resins limit the printing resolution as fea- 
tures approach the length scale of the dispersed 
particles. The smallest reported free-standing 
features that are achieved with particle-derived 
TPP-printed silica are 0.4 um in size (3); the 


(. 


maximum resolution—i.e., the smallest re; Chee 
able spacing of several features—is often 2 i 
times as large and has not been reported. This 
spacing is still insufficient for nanophotonic 
devices for the visible light spectrum, such as 
metalenses (27), 3D bandgap materials (23), 
and invisibility cloaks (24). Standard TPP with 
organic resins can print down to 100-nm-sized 
features (15). Optimized print setups and pre- 
cursor chemistries can already push below 
10 nm (J5, 25), smaller than a single nanoparticle 
of the existing silica-particle TPP resins. Similar 
limitations may also apply to the achievable 
surface quality. Ultimately, the development 
of dispersions from ever smaller particles is 
limited, and particle-based approaches may 
not be able to meet the continuously increasing 
capabilities of TPP processes. 

The thermal decomposition of organic and 
organic-inorganic hybrid polymers is a promis- 
ing particle-free alternative to manufacture 
inorganic materials. This approach is currently 
being widely studied for the TPP fabrication 
of a range of micro- and nanoscale ceramics. 
TPP printing and subsequent heat treatment 
with organic, preceramic, and sol-gel precursors 
manufactures 3D nanostructures with feature 
sizes down to <200 nm in glassy carbon (26), 
silicon oxycarbide (27, 28), and titania (29), as 
well as glass ceramics (30, 31), respectively. 
The latter can also be visibly transparent and 
have been used to print optical lenses (31-33), ‘ 
albeit the optical transmission has not been 
reported. However, the sol-gel approaches are 
disadvantageous compared with the particle- 
loaded resins (2, 3) from a processing perspec- 
tive. They entail tedious preprint preparations, 
the hardened gel film state imposes printing 
constraints, and to densify the final material, 
the TPP-printed templates are also heat treated 
at 1000° to 1100°C (30-32). 

Polyhedral oligomeric silsesquioxanes (POSSs) « 
(34, 35) are hybrid organic-inorganic polymers ‘ 
composed of cage-like silicon-oxygen frame- 
works with a general formula (SiO, ;) close to 
that of fused silica. However, POSS polymers 
have so far not been used to TPP print silica 
glass. At their corners, the POSS-cage mole- 
cules can bond to a large catalog of organic 
functional groups to enable polymerization 
into solids with greater resistance to temper- 
ature and oxidation than most purely organic 
polymers. POSS polymers have been studied 
for their suitability as templating materials 
for semiconductors within different lithography 
techniques (36-38). More recently, epoxy- 
functionalized POSS resins (39) have success- 
fully been applied in TPP printing. However, the 
reported efforts still focused on the synthesis 
of temperature-stable hybrid polymers rather 
than exploiting the POSS material platform 
as a precursor to manufacture purely inorganic 
materials. Thermal decomposition of printed 
parts has been found to form glass ceramics 
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Fig. 1. Fabrication of high-quality fused silica nanostructures from an acrylate-functionalized POSS resin. (A) Schematic synthesis through TPP 3D printing 
and subsequent thermal treatment at 650°C. (B to I) Micrographs of fused silica structures: (B) woodpile photonic crystal with inset optical true-color blue-violet light 
reflection (front structure), (C) close-up top view of pattern from 97-nm-wide lines, (D and E) octet nanolattice composed of >5000 beams, (F and G) parabolic 
microlenses, (H) 150-um-tall multilens diffractive micro-objective with inset optical micrograph, and (I) close-up view of the nanostructured Fresnel lens element. 
Scale bar in (C), 100 nm; all other scale bars, 10 um. 


with organic impurities, and no optical properties 
have been reported (39). Like sol-gel precur- 
sors, epoxy-functionalized resins also constrain 
prints because printing is performed within 
spin-coated gel thin films, which limits structures 
to low aspect ratios on flat substrates. 

We present a sinterless, low-temperature 
3D-printing route that fabricates complex trans- 
parent fused silica glass nanostructures (Fig. 1). 
We introduce a particle-free organic-inorganic 
POSS-glass resin engineered to use acrylate- 
functionalized POSS chemistry (i) to TPP print 
high-quality 3D structures in an unconstrained, 
facile, and reproducible manner and (ii) to con- 
vert as-printed polymer templates into high- 
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fidelity, optical-grade SiO. nanostructures 
through low-temperature thermal treatment. 
We schematically illustrate the composition of 
our POSS-glass resin, the TPP printing of 
polymeric templates, and their conversion 
into fused silica in Fig. 1A. 


Resin formulation 


Our POSS-glass resin is a negative-tone TPP 
photoresist composed of three parts, each of which 
contributes a specific set of functionalities (fig. 
S]): (i) 89 wt % acrylate-functionalized POSS 
monomer, (ii) 9 wt % trifunctional acrylic 
monomer, and (iii) 2 wt % photoinitiator of the 
a-aminoketone family (40). The POSS mono- 


mer was the main component, whose POSS-cage 
cores constituted the silicon-oxygen nano- 
cluster source that enables the SiO. conver- 
sion. Its acrylic functional groups were essential 
to achieve high-performance TPP. Acrylate- 
based resins are the most widely used TPP 
material class (41, 42) because of their pro- 
cessing ease and wide assortment of func- 
tionalities and monomer sizes (43). Contrary 
to epoxy or sol-gel TPP resins, the acrylic 
reaction kinetics (44) allow printing in a 
liquid state with a high polymerization rate 
(45). However, the rigid structure of POSS mono- 
mers generally prevents the formation of 
sufficiently cross-linked (15, 46) self-supporting 
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Fig. 2. Materials characterization confirming treatment at 650°C creates 
pristine fused silica glass. (A to C) Simultaneous TGA (A), DSC (B), and mass 
spectrometry (C) illustrate how the polymerized precursor’s organic compounds 
decompose between 350° and 650°C; monitored emissions correspond to the 
mass/charge ratios (m/z) of the molecular ions of the indicated substances. 
(D) Micro-Raman spectra after treatment at increasing temperatures show the 
conversion of as-printed templates into fused silica at 650°C. (E) Bright-field 
TEM images and a selected area diffraction pattern confirm a homogeneous 


TPP-printed parts. Reported epoxy-POSS TPP 
resins are limited to 10 to 60 wt % POSS 
loading (39). In our material, the conforma- 
tional flexibility of the small addition of the 
long-armed, branched trifunctional acrylate 
facilitates reproducible TPP printing despite 
the high POSS loading of 89 wt % and provides 
important resilience against cracking (47). 
This was key to printing structures with a suffi- 
ciently close packing of silicon-oxygen nano- 
clusters, which successfully converted to dense 
SiO, at low temperatures. Furthermore, the 
branched trifunctional acrylate’s concentra- 
tion allowed control over the resin’s viscosity 
(48). Acting as an eluent modulating the 
diffusion of radicals and dissolved molecular 
oxygen, this enabled the resin to print finely 
resolved features. The chosen photoinitiator 
induced copolymerization of the resin’s acrylic 
groups through light exposure. We selected it 
for its efficient radical generation quantum 
yield, nonlinear absorption, and primary radical 
reactivities at the excitation wavelength of 
780 nm of the TPP system we used (49, 50). 

We synthesized the POSS-glass resin by 
means of a mixing and heating procedure 
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of the above three components (57) and ob- 
tained a clear, light-yellow liquid that is sta- 
ble at ambient conditions for several years 
and readily usable for TPP printing. We opti- 
mized the final mixture’s compositional ratio 
to maximize its silicon-oxygen nanocluster 
content while retaining excellent printabil- 
ity, as confirmed by TPP-printed calibration 
grids (fig. S2). 


Facile fabrication of complex nanostructures 


TPP printing of 3D polymer-template struc- 
tures followed simple standard procedures (15) 
by using a commercial TPP system. Therein, 
the resin was drop cast onto fused silica or 
silicon substrates, and the printer’s magnifi- 
cation objective was directly immersed in 
the resin. The objective focused an ultrafast 
pulsed laser beam into the resin. Within the 
focal volume, simultaneous absorption of two 
photons by the photoinitiator molecules results 
in their homolytic cleavage and the formation 
of two radicals. These initiated the cross-linking 
of the monomers’ acrylate groups, which trans- 
formed the resin into a solid network that was 
composed of an organic matrix with embedded 
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amorphous microstructure, free from detectable pores. (F) EELS data confirm 
that the material is composed solely of silicon and oxygen with an atomic ratio 
closely matching stochiometric SiO». (G) Measured diameters of disk-shaped 
specimens (inset) after exposure to increasing temperatures show the linear 
contraction of as-printed templates as they convert to fused silica; above 
650°C, the final POSS glass retains perfect geometrical integrity up to 1200°C, 
with 58 + 1% of the as-printed size. (H) As-processed fused silica nanolattice 
before and after high temperature exposure. Scale bars in (G) and (H), 20 um. 


silicon-oxygen POSS nanoclusters. The 3D 
structures were printed by in-plane scanning 
of the focused laser beam by means of galva- 


nometer mirrors and by three-axis motion of + 
the piezoelectric sample stage. In contrast to ‘ 


reported TPP-printed epoxy-functionalized POSS 
(39), preceramic (29), and sol-gel (30) resins, no 
pretreatments, restricting immersion oil and 
spacer layers or similar were required. After 
printing, a 20-min-long isopropanol alcohol 
development bath dissolved the remaining 
uncured resin. The fabricated specimens were 
either air dried or, for the case of the most 
delicate structures, supercritically dried to pre- 
vent damage from capillary forces. 

Moderate thermal treatment (fig. S3) to only 
650°C in an air atmosphere converted the 
as-printed polymer templates to fused silica 
structures. Accompanied by an isotropic linear 
contraction of ~40%, the elevated temperature 
decomposed and degassed the organic com- 
pounds, with the atmospheric oxygen removing 
the remaining elemental carbon. Therein, our 
POSS templates’ densely packed continuous 
silicon-oxygen molecular networks consti- 
tuted the crucial feature that circumvents 
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Fig. 3. TPP-printed POSS glass enables the fabrication of high-quality 
free-form micro-optical elements. (A) Free-standing, disk-shaped 
specimens for optical transmission measurements through UV-Vis-NIR 
microspectrophotometry. (B) Optical transmission data show transparency on 
par with commercial fused silica and exceeding literature-reported fused 
silica (3, 61, 73) from sol-gel, preceramic, and particle precursors, 3D printed 
through TPP or digital light processing (DLP); indicated temperatures refer 
to thermal treatments during manufacturing. The inset shows the area 

where the UV-Vis-NIR signal was collected. (©) Atomic force microscopy data 


the extreme temperatures that are otherwise 
required to sinter discrete silica particles to 
a continuum (J-3). 

We demonstrate a variety of 3D fused silica 
glass micro- and nanostructures (Fig. 1, B to I) 
that outperform the resolution, structure quality, 
and coverable size scale of previously reported 
inorganic TPP-printed materials. We fabri- 
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cated woodpile photonic crystals composed of 
97-nm-sized free-standing features (Fig. 1, 
B and C). This constitutes a fourfold improve- 
ment over existing TPP-printed fused silica (3) 
and matches the smallest reported features of 
inorganic TPP structures (30) in general. More- 
over, the feature quality we achieved substan- 
tially outperforms that of the previously reported 
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from a flat disk show optically smooth surface finish. (D) Micropillar and a 
measured compressive stress-strain curve demonstrating ultrahigh 
mechanical resilience with 10-fold increased strength and stiffness over 
TPP-printed polymer (17). (E) Aspheric aberration—corrected high-precision 
microlenses as optical device demonstrators. (F) Optical profilometry 
confirms near-ideal accuracy. (G) Images formed by the microlenses 

of a resolution target demonstrate excellent imaging performance; inset 
contrast intensity profiles show up to 700 Ip/mm are resolved. 


comparably resolved structures (30). The pho- 
tonic crystal we synthesized had a rod spacing 
of 350 nm, which demonstrates the capability 
to realize nanophotonic structures at wave- 
lengths approaching the ultraviolet (UV) regime 
(24, 52). The optical micrograph (Fig. 1B, inset) 
shows the structure reflecting light of a blue- 
violet color, along with photonic crystals that 
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adjust colors of longer wavelength by larger 
rod spacings. Furthermore, we printed pris- 
tine nanolattice metamaterials composed of 
thousands of individual bars (Fig. 1, D and E), 
smoothly shaped aspherical microlenses (Fig. 
1, F and G), and complex mesoscale micro- 
objectives (Fig. 1, H and I) with ~150-um 
overall size, which contained diffractive lens 
elements with nanoscale details. Overall, our 
POSS-glass process achieved a level of print 
quality, complexity, and coverable size scale 
that was previously only realizable with poly- 
meric structures from standard organic resins. 


Materials characterization 


Our materials characterization confirmed that 
moderate thermal treatment at only 650°C in 
air atmosphere successfully converted the 
POSS resin to pure fused silica. Figure 2 shows 
the results from combined thermogravimetric 
analysis (TGA), differential scanning calorim- 
etry (DSC) and mass spectrometry as well as 
micro-Raman spectroscopy, and transmission 
electron microscopy (TEM). 

Combined TGA, DSC, and mass spectrome- 
try identified the glass conversion of our mate- 
rial to take place between 350° and 650°C (Fig. 2, 
A to C). The material underwent a total mass 
loss of ~65%, with three mass derivative peaks 
at 415°, 480°, and 595°C that correlate with 
three exothermal peaks of the heat flow data. 
Each of these peaks corresponded to three 
consecutive reaction stages that are charac- 
teristic of the thermo-oxidative degradation of 
highly cross-linked acrylic polymers (53, 54). 
In the first and second stages, these reaction 
paths include the formation of peroxide 
groups, followed by random chain scission and 
volatilization of produced species such as water, 
carbon dioxide, hydrocarbons, alcohols, and 
higher-mass species (53, 55). Mass spectrometry 
of the exhaust gases confirmed this fragmenta- 
tion as monitored by the molecular ions of 
acetylene (C+Hs,), 1,2-ethanediol (C,H,O2), and 
methylpropionate (C,H,O.). During the first 
reaction stage, emissions of all the above 
species were present simultaneously with CO, 
and H.O. The second stage continued the 
decomposition; however, no further higher- 
mass species were formed. In the third and 
final reaction stage, only CO, and H,O emissions 
passed through a maximum, with no increase 
in emissions of monomer-related ions. This indi- 
cates the final reaction stage as the complete 
oxidation of remaining stable hydrocarbon 
impurities. We confirmed this by a control 
TGA and DSC experiment in inert atmosphere 
(fig. S4:). The inert decomposition also included 
the first two reaction stages, which are primarily 
temperature driven (54, 55); however, it 
completed without a third stage and formed 
chars with marked amounts of residual car- 
bon. Above 650°C, neither TGA nor DSC showed. 
any notable further changes, which indicated 
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complete volatilization of all organic constitu- 
ents and left an inorganic material behind. In 
general, oxidizing atmospheres accelerated the 
decomposition processes (55). In a pure-oxygen 
atmosphere, the decomposition of our material 
completed at ~600°C (fig. S4). 

Micro-Raman spectroscopy measurements 
after thermal treatment at progressively increas- 
ing temperatures demonstrated the conversion 
of as-printed organic-inorganic POSS structures 
into fused silica (Fig. 2D). As a reference, we 
provide the spectrum of commercial fused silica. 
Therein, the w, and ws bands correspond to 
bending vibration of the Si(O,,2)4 tetrahedrons’ 
Si-O-Si bridges, and the w, bands are attributed 
to the stretching motion of their Si-O bonds (56). 
The D, and Dz lines relate to the symmetric 
stretching of silicon-oxygen ring molecules (56). 
Distinct from the fused silica signal, the spec- 
trum of as-printed POSS structures was typical 
of a thermoset, for which the strongest peaks 
represent the carbon-carbon (1630 cm™) and 
carbon-oxygen (1720 cm”) double bonds, whose 
intensity ratio can be used to quantify the extent 
of cross-linking between the acrylic chains (/7). 
The signal around 2900 cm™ corresponded to 
the characteristic aliphatic and aromatic stretch- 
ing modes of the carbon-hydrogen single bonds 
(57). At 500°C, the organic microstructure had 
partially disappeared, as demonstrated by the 
absence of the 2900 cm signal. The remaining 
associated peaks became smaller and notably 
broadened, which is indicative of increasing 
disorder. This observation is consistent with 
the above simultaneous thermal analysis, which 
confirms the fragmentation and removal of a 
substantial portion of the material’s organic 
groups in the first two reaction stages. Simul- 
taneously, the typical signal of fused silica 
below 1000 cm“ began to appear. This shows 
that the material’s silicon-oxygen POSS-cage 
nanoclusters, which are initially solely connect- 
ed through the cross-linked organic matrix, 
directly start to form a continuous inorganic 
silica network as organic groups decompose 
and volatilize. Above 600°C, the organic peaks 
disappeared entirely, and the spectra took the 
characteristic fused silica shape, which indi- 
cated the material had completely transformed 
into SiO, at 650°C. In agreement with the TGA, 
DSC, and mass spectrometry results, the spec- 
tra collected after treatments above 650°C re- 
vealed the absence of any further compositional 
changes and only showed some microstructural 
reorganization. Between 650° and 800°C, the 
decreasing intensity of the D, and Dz lines 
with respect to the w, band indicated the 
transition of four- and three-membered ring 
molecules, which may have been inherited 
from the POSS-cage structure, toward tetra- 
hedrons. The disappearance of the small peak 
at 972 cm“ above 700°C indicated the elimi- 
nation of a trace amount of tetrahedral silica 
with two nonbridging oxygens (58). Above 


800°C, the spectra of the POSS glass and 
commercial fused silica were identical, and 
no further changes were observed up to the 
maximum temperature tested, 1200°C. 

We used TEM to confirm that our POSS 
glass is pristine SiO. We took measurements 
on a lamella extracted from the center plane of 
a 10-um-diameter micropillar. Bright-field TEM 
micrographs showed a homogeneous amor- 
phous phase without any detectable pores, 
which we confirmed by selected area diffrac- 
tion of the interior of the lamella (Fig. 2E). We 
determined the composition by electron en- 
ergy loss spectroscopy (EELS) at 14 points 
along the center axis of the lamella at varying 
distances from the top surface of the pillar 
(Fig. 2F). We did not detect impurities, and 
the material consisted solely of silicon and 
oxygen, which closely matched stochiometric 
SiO.. We measured 29 + 1 atomic percent (at 
%) silicon and 71 + 1 at % oxygen; the typical 
uncertainties associated with the individual 
EELS quantifications are on the order of 2 to 4 
at % (59, 60). 

Although processed at only 650°C, the POSS 
glass retained perfect geometrical integrity 
upon high temperature exposure, which is 
consistent with the demonstrated chemical 
stability. Dimensional characterizations after 
exposure to increasing temperatures, from 
the as-printed polymer-template state up to 
1200°C, show the TPP-printed template struc- 
tures underwent isotropic linear contraction 
of 42 + 1% during their thermal conversion. 
After 650°C, the resulting fused silica retained 
perfect geometrical integrity up to 1200°C 
without measurable further shrinkage (Fig. 
2G). Correspondingly, even the most delicate 
nanoarchitectures weathered higher temper- 
atures without any distortion, fusion, or other 
damage (Fig. 2H). 

Despite being processed at considerably 
lower temperatures, the optical transpar- 
ency of our 3D-printed POSS glass exceeded 
that of previously reported additively manu- 
factured forms of fused silica. We conducted 
UV-visible-near-infrared (UV-Vis-NIR) micro- 
spectrophotometry measurements with free- 
standing, 25-um-thick disk-shaped specimens 
that were TPP printed from our POSS-precursor 
and converted to fused silica at 650°C (Fig. 3A). 
The POSS glass had excellent optical transmis- 
sion, on par with commercial fused silica. 
Across the measurement range from the UV 
to the NIR spectrum, no absorption bands 
were present (Fig. 3B). By contrast, the trans- 
mission of silica glasses from sol-gel precursors 
(61) that have been 3D printed at the macro- 
scale and processed at 800°C are reportedly 
limited to ~70% and almost completely opaque 
in the UV range. Also, the particle-derived 
TPP-printed fused silica (3), sintered at 1100°C, 
did not quite reach the transmission of the 
POSS glass. Consistent with the demonstrated 
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structural thermal stability, exposure to 1000°C 
did not notably alter the transmission of our 
material (fig. S5). 

The POSS glass further achieves an optically 
smooth surface finish and ultrahigh mechan- 
ical strength. Atomic force microscopy on a 
flat disk measured a root mean-square (RMS) 
roughness of 5.5 nm (Fig. 3C). Compression of 
POSS-glass micropillars treated at 650°C showed 
elastic-plastic behavior with notable plastic 
deformability and 4.0 + 0.2 GPa strength 
(Fig. 3D). Granted by the small scale, which 
limits the probability of preexisting flaws, 
this value is four times as high as the com- 
pressive strength of bulk UV-grade fused silica 
(62). Comparably beneficial mechanical behav- 
ior has been reported for opaque TPP-derived 
pyrolytic carbon (63, 64). Treatment at 1000°C 
was found to further increase the strength of 
the POSS glass (fig. S6) (57). The measured 
Young’s moduli of up to 67 GPa were within 
the range of common forms of dense fused 
silica (65). Our POSS glass has more than an 
order of magnitude higher strength and stiff- 
ness (17) than the state-of-the-art polymers 
that hold the current benchmark for TPP- 
printed high-fidelity micro-optics. 


Optical device demonstration 


We demonstrate our material enables the 
fabrication of free-form fused silica glass 
micro-optical elements with excellent optical 
performance (Fig. 3, E to G). Lens systems for 
imaging and beam shaping are among the 
most important micro-optical devices. How- 
ever, the highest-precision glass microlenses 
(66) have thus far been fabricated by subtrac- 
tive top-down approaches, which are limited 
to simple designs that, for example, cannot 
correct for aberrations. Here, we TPP printed 
planoconvex fused silica microlenses with an 
aspheric profile, which was numerically opti- 
mized to correct for spherical aberrations. 
The final POSS-glass lenses, with a base diam- 
eter of 82 um and 15 um sagittal (sag) height, 
were treated at 650°C and were of pristine 
structural quality with finely resolved nano- 
scale contours and smooth surfaces (Fig. 3E). 
We conducted optical profilometry measure- 
ments (Fig. 3F) to confirm the excellent shape 
accuracy with a peak-to-valley deviation of 
the lens profile with respect to the aspheric 
design of +175 nm. The measured RMS rough- 
ness was 8.1 nm (fig. S7), which translates to 
an RMS-to-sag ratio of 0.05%. These values are 
on par with the latest achievements with 
polymeric TPP-printed lenses (67)—which 
report shape deviations of 0.1 to 0.5 um and 
4- to 15-nm RMS roughness—and within the 
specifications of the highest-quality commer- 
cial glass microlenses fabricated by reactive 
ion etching or ion exchange techniques, for 
which RMS/sag ratios of 0.01 to 0.09% are 
reported (66). Optical resolution measurements 
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with a 1951 USAF-type resolution target under 
white light illumination demonstrated the 
excellent imaging performance of our micro- 
lenses. Figure 3G shows images formed by the 
microlenses of the target, which we projected 
onto a complementary metal-oxide semi- 
conductor camera sensor with an optical micro- 
scope system. The visible labels indicate the 
respective pattern elements’ number of line 
pairs per millimeter (Ip/mm), and the inset 
graphs show the measured intensity contrast 
between adjacent line elements. We were able 
to resolve up to 700 Ip/mm with ~6% remain- 
ing contrast with our microlenses, meaning 
that 714-nm-sized features remained distin- 
guishable. This approximately corresponds to 
group 9, element 4 of the 1951 USAF target, 
which notably outperforms previously reported 
inorganic planoconvex microlenses that were 
TPP printed from sol-gel precursors (31, 33, 68), 
whose resolution capability is reported within 
groups 4 to 7 of the 1951 USAF target. 


Conclusion 


The POSS-glass TPP 3D printing route may 
help redefine the paradigm for the free-form 
manufacturing of silica glass and overcome 
the fundamental limitations of the particle- 
based approaches that have dominated the field. 
The crucial innovation of our approach lies in 
the developed POSS resin, which, contrary to a 
particle-loaded binder, is not sacrificial but 
itself polymerizes into a continuous silicon- 
oxygen molecular network. Hence, the mate- 
rial circumvents the extreme temperatures 
that are otherwise required to sinter discrete 
silica particles to a continuum (/-4), which 
enables conversion to fused silica at only 650°C. 
By constituting a temperature reduction of 
~500°C with respect to the best reported 
TPP approaches (2, 3), this brings the free- 
form synthesis of silica glass below the melting 
points of essential materials for microsystems 
technology, including silver, copper, gold, and 
aluminum. This represents a breakthrough that 
enables the evolution of on-chip 3D printing 
of transparent matter from state-of-the-art 
organic polymers to resilient optical-grade 
fused silica. Similarly, our POSS-glass process 
breaches the critical resolution limit to realize 
free-form silica nanophotonic devices in the 
visible light spectrum (24, 52) while simul- 
taneously being capable of manufacturing 
hundreds of micrometer-sized high aspect 
ratio structures. Overall, we achieved attractive 
combinations of optical quality, mechanical 
resilience, processing ease, and coverable size 
scale and set the benchmark for the micro- 
and nanoscale 3D printing of inorganic solids 
in general. 

The potential fields of application of our POSS 
glass are widespread, ranging from micro-optics 
and photonics, MEMS, and microfluidic and 
biomedical devices to fundamental research. Ex- 


amples include aging- and environment-resistant 
ultracompact imaging systems (J8) for appli- 
cations from medical endoscopes to consumer 
electronics; superior-accuracy sensors, whose 
3D design today typically limits them to 
centimeter-sized devices for costly applica- 
tions, such as deep space missions (69); and 
beam-shaping elements (79) for the end faces 
of diode lasers, which are the basic compo- 
nents for most high-power laser applications 
but whose output power cannot be sustained 
by polymers. In fracture mechanics research, 
fused silica is a model material (70); however, 
specimen geometries are often nontrivial and 
challenging to manufacture. The design free- 
dom of our POSS-glass process enables us to 
systematically investigate fracture mechanisms 
at the smallest scale, which includes meta- 
materials, such as nanolattices (71, 72). 
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Miniature magneto-mechanical resonators 
for wireless tracking and sensing 


Bernhard Gleich’, Ingo Schmale’, Tim Nielsen’, Jiirgen Rahmer’* 


Sensor miniaturization enables applications such as minimally invasive medical procedures 

or patient monitoring by providing process feedback in situ. Ideally, miniature sensors should be 
wireless, inexpensive, and allow for remote detection over sufficient distance by an affordable 
detection system. We analyze the signal strength of wireless sensors theoretically and derive a 
simple design of high-signal resonant magneto-mechanical sensors featuring volumes below 1 cubic 
millimeter. As examples, we demonstrate real-time tracking of position and attitude of a flying bee, 
navigation of a biopsy needle, tracking of a free-flowing marker, and sensing of pressure and temperature, 
all in unshielded environments. The achieved sensor size, measurement accuracy, and workspace of 

~25 centimeters show the potential for a low-cost wireless tracking and sensing platform for medical and 


nonmedical applications. 


he rise of minimally invasive procedures 

has created a need for tracking medical 

instruments inside the human body, 

which is currently satisfied by cabled 

electromagnetic trackers (J), optical track- 
ing approaches based on cameras (2) or op- 
tical fibers (3), imaging-based markers (4), 
and wireless radiofrequency (RF) markers (5). 
Specific drawbacks of each method, however, 
limit their general usability. Not all body lo- 
cations can be reached with a wire, an optical 
line of sight is usually not available inside the 
human body, imaging equipment is costly or 
may use harmful radiation, and wireless RF 
markers require a minimal size of roughly 10 mm 
to accommodate an antenna for communi- 
cation with a detector outside the body (5). 
Another field that suffers from current tech- 
nology limitations is the data-driven evalua- 
tion of physiological parameters (6), e.g., for 
early detection of disease but also for moni- 
toring patients at home to reduce time in the 
hospital (7). In these applications, small low- 
cost sensors are required to continuously 
monitor and report physiological param- 
eters not only from the surface (8) but also 
from inside the human body. Based on the- 
oretical insights, we designed a technology 
platform that can address these needs by 
combining existing technologies with mini- 
aturized sensors ~1 mm in size. The plat- 
form may serve many applications, e.g., the 
navigation of surgical devices and cathe- 
ters (2), home monitoring of blood pressure 
(9), radiation-free gastric emptying studies 
(0), controlling oral medication adherence 
(11, 12), monitoring insect behavior (13, 14), 
or labeling of goods with sensors acting as 
micro radiofrequency identification (RFID) 
tags (15). 


’Philips Research, Hamburg, Germany. 
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The basic concept of our magneto-mechanical 
resonator (MMR) is displayed in Fig. 1A. The 
resonator consists of two spherical NdFeB mag- 
nets (J6), one fixed to the cylindrical housing 
and the other suspended from a thin filament. 
The suspended sphere is held in place by the 
magnets’ mutual attractive force, which sur- 
passes the gravitational force by 3 orders of 
magnitude. The torque exerted by the fixed 
magnet’s field drives the magnetic moments 
into an antiparallel alignment. To start a ro- 
tational oscillation, an external magnetic field 
pulse is applied whose torque creates an an- 
gular deflection of the suspended sphere (movie 
Sl and Fig. 1A, white double arrow). The os- 
cillation frequency is determined by the re- 
storing torque provided by the fixed sphere 
and thus depends on the distance between the 
spheres. The principle of our MMR detection 
system is described in Fig. 1B. It generates 
current pulses that are transmitted through 
electromagnetic coils (Fig. 1C) to excite the 
MMR oscillation. The oscillating magnetic 
moment then induces a voltage in these coils 
which, after amplification, represents the re- 
ceived signal. The low friction in the MMR 
filament bearing translates into a slow sig- 
nal decay, so that short excitation periods can 
alternate with long windows for signal acqui- 
sition. Figure 1D displays the voltage induced 
in one of the 16 receive coils by an MMR res- 
onating at 2.2 kHz. It is repeatedly excited by 
short pulse trains. Before each re-excitation, 
the system performs a real-time evaluation of 
the acquired data to adjust excitation pulse 
frequency, phase, and amplitude for optimal 
buildup of the MMR oscillation. The signals ob- 
tained from the different coils encode spatial 
position and orientation of the MMR. Because 
the planar oscillation of the magnetization 
vector creates two orthogonal signal com- 
ponents (see supplementary materials), the 
position and full orientation information, i.e., 
6 degrees of freedom (DoF), can be recon- 


(. 


structed using the known spatial sensiti Chec 
‘ : upd 
profiles of the coils. 

The signal amplitude distribution over the 
coils is used for tracking, while the frequency 
of the MMR signal does not carry spatial infor- 
mation. It can be used to measure a physical 
parameter that changes distance between the 
two magnetic spheres and thus modulates the 
oscillation frequency. Examples are temperature 
through thermal expansion or pressure through 
compressible housing. Materials that respond 
to radiation or certain chemicals enable dosim- 
eters or chemical sensors, respectively. 


Results 


To highlight the small marker footprint, a bee 
equipped with an MMR (~1.5 mg) is tracked 
in real time while walking and flying at dis- 
tances up to 200 mm from the coil array (Fig. 2 
and movies S2 and S3). The raw measurement 
rate is ~40 Hz and each measurement deliv- 
ers the bee’s momentary position (2, y, 2) and 
attitude (pitch, yaw, roll), i.e., 6-DoF informa- 
tion. According to both the tracking data and 
camera view, the flying bee achieved velocities 
up to 600 mm per second. 

For demonstrating medical navigation, an 
MMR is integrated in a stylet that is inserted ‘ 
in a curved biopsy needle (Fig. 3A). Because of 
the low resonance frequency of 2.2 kHz, the 
titanium alloy of the needle only weakly at- 
tenuates the MMR signal. Therefore, accu- ‘ 
rate needle tip tracking over the volume of a 
gelatin phantom (Fig. 3B, diameter 135 mm) 
is possible, with ~140 mm distance between 
the needle and coil array. The 6-DoF informa- 
tion enables accurate navigation of the needle 
toward a target (Fig. 3C and movie S4). Nav- 
igation could be fully based on a “roadmap” 
provided by a medical imaging system whose 
frame of reference is aligned with the tracking 
system. No line of sight to the needle tip is + 
required and the camera view has only been ‘ 
included for reference. To simulate tracking 
of an ingestible marker during gastrointesti- 
nal passage, an untethered MMR is localized 
while flowing through a winding tube phan- 
tom (fig. S5 and movie S7). 

To demonstrate sensing, a sealed MMR pres- 
sure sensor is subjected to pressure changes 
(Fig. 4). The diffusion-tight metallic housing 
(Fig. 4A) is compressible, similar to a common 
aneroid barometer. Sensitivity and measure- 
ment range of the sensor can be tuned through 
the stiffness of the housing. The sensor uses 
two oscillating spheres to reduce sensitivity to 
static magnetic background fields. Figure 4B 
shows the resonance frequency extracted from 
the measured signal while pressure changes are 
applied manually using a syringe. For reference, 
the pressure measured by a commercial pres- 
sure sensor is plotted in Fig. 4C, showing 
that frequency directly represents pressure. A 
pressure range of 400 mbar (300 mmHg) is 
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Fig. 1. MMR system components and signal response. (A) MMR demonstrator 
in relation to a coin (1 Euro Cent, diameter 16.25 mm) and sketch of its design. 
The suspended magnetic sphere (diameter 0.5 mm) can perform a rotational 
oscillation (white double arrow, see movie S1) about the long axis of the cylinder, 
which has a volume of 0.96 mm®. In equilibrium, the spheres have antiparallel 
alignment (red, magnetic north pole; green, south pole). (B) Schematic of transmit 
and receive (Tx and Rx, respectively) detection system with n channels. A field- 
programmable gate array (FPGA) is used for real-time control of the Tx and Rx data 
streams. The transmit pulse trains are generated by digital-to-analog converters 


covered, which is sufficient for a blood pres- | Scaling Laws 


200 400 600 800 


time / ms 


(DAC) and sent through power amplifiers (PA) to the coils of the array. The receive 
signals pass through low-noise amplifiers (LNA) to the analog-digital converters 
(ADC) connected to the FPGA. A switch protects the receive path during application 
of the excitation pulses. (C) 4 x 4 coil array used for MMR excitation and detection 
of its spatial signal profile. (D) Overview and magnification of a typical MMR time 
signal. Brief excitation windows (overlaid by yellow-green boxes) alternate with 
receive periods during which signal decays. Starting from equilibrium, excitation 
pulses are played repeatedly with correct timing to build up the oscillation amplitude 
over several transmit and receive cycles. 


resonators because the resonant circuit con- 


sure sensor (9) and leaves room for pressure 
offsets as a result of different geographic al- 
titudes. Sensitivity of the sensor is 0.34 Hz 
per mbar. Figure S4 and movie S5 show that 
the tracking marker of the needle navigation 
experiment can also be used for measuring 
temperature. 
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The experiments demonstrate MMR tracking 
and sensing using millimeter-sized devices. 
This miniaturization was achieved by design- 
ing the MMRs for maximal signal. For com- 
parison with existing technology, we compare 
the signal of MMRs with conventional pas- 
sive RF circuits. RF circuits are also called LC 


tains an inductor Z and a capacitor C, with 
the inductor acting as an antenna (circuit 
diagram in fig. S3A) (5, 9). For generating a 
signal that is detectable during the acquisition 
phase, MMRs and LC resonators must first be 
driven to sufficiently high oscillation ampli- 
tudes using excitation fields of amplitude B, at 
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Fig. 2. Bee tracking experiments. (A) Honeybee equipped with an MMR marker. (B) Tracked path of a bee 
walking upside down below the transparent lid of a box garnished with a bunch of meadow flowers. The lid is 
~20 cm above the planar 4 x 4 coil array used for detection. The path of the bee reconstructed from the MMR 
signal is plotted with an overlay of images extracted from a video sequence (see movie S2). The black and 
gray lines indicate the bee's attitude in space for the last displayed measurement, and the previous orientations of 
the long bee axis (black line) are color coded in the plotted path (red, green, and blue lines indicate bee axis 
aligned along the x, y, and z directions, respectively). (C) Brief segment of a tracking experiment of a flying bee. 
(Right) The graphs show orthogonal projections of the reconstructed bee positions to visualize the 3D flight 
path relative to the box (indicated by violet lines). From each measurement, position and attitude of the bee is 
obtained as shown by the lines plotted at the last position (black, long axis of bee; dark gray, lateral axis; light 
gray, vertical axis). (Left) Extracted movie frames show the good correlation of the attitude obtained from 

MMR tracking with the attitude seen by the two cameras. 


the device resonance frequency fo. The dom- 
inant limiting factor to the applicable field 
amplitude is the rate of field change R = 
2nfoB, that describes the magnitude of field 
change per time. R determines how much 
voltage is induced in surrounding materials 
and is thus restricted by safety limits on touch 
voltages on metallic surfaces and by physio- 
logical limits such as peripheral nerve stimu- 
lation and tissue heating (/7, 18). To quantify 
the achievable signal, we introduce a normal- 
ized device signal x as a function of rate R and 
radius 7 of the field generating element in the 
wireless device. 
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For MMRs, 7 is the radius of the oscillating 
spherical magnet (cf. Figure 5A) and the key 
finding is the proportionality: 


=MMR oc R3 ri (1) 


For LC resonators, 7 is the outer radius of the 
coil element (cf. Figure 5B) and the device 
signal scales as: 


ZreeR e (2) 


The complete formulae and their derivation 
are found in the supplementary materials. 


Figure 5C displays device signals Lue 
and Xzc as a function of 7 for different rates 
of field change R. In view of safety regula- 
tions, a rate of R ~ 1000 mT/s at the device 
position is assumed to be the highest tol- 
erable rate in the frequency range between a 
few kilohertz and 100 MHz. The combina- 
tion of this limit with a minimal detectable 
signal threshold leads to triangular regions 
in the plots, indicating the signal and size 
combinations which the respective technology 
can provide. For LC resonators the triangular 
operating region is shaded in blue whereas the 
region for MMR devices is shaded in green. 
According to the scaling laws, the operating 
region for the MMRs extends to smaller de- 
vice sizes. The needle tracking experiment is 
indicated as “MMR.” Because of power lim- 
itations of the transmit amplifiers of our de- 
tection system, it does not fully exploit the 
theoretical size reduction potential for this 
signal level. For comparison, an LC resonator 
would need to be almost one order of magni- 
tude larger in linear dimension 7 to reach the 
same signal level. 

For sensing applications, information is en- 
coded in frequency and therefore frequency 
resolution Af relates to measurement accuracy. 
For assessing sensor performance, the device 
signal & must thus be weighted with a quality 
function ¢ coupled to frequency resolution, as 
described in the methods section. The respec- 
tive products = - ¢ are plotted in Fig. 5D, show- 
ing that the relative miniaturization potential 
of MMR technology compared with LC tech- 
nology is even higher for sensors than for 
markers. 


Discussion and Conclusion 


MMR technology enables miniaturization of 
wireless markers and sensors by ~1 order of 
magnitude in the linear dimension compared 
with existing LC resonator technology; the 
required field generator and detection system 
have a similar footprint. The marker used in 
the needle tracking experiment has a length 
of 1.9 mm whereas an LC-based product with 
similar workspace has a length of 8 mm (5). 
Commercial miniature RFID tags used in the 
bee experiments (J3) have a size similar to 
that of the MMR but provide so little signal 
that they can only be detected up to a few 
millimeters. The MMR pressure sensor has 
a length of 1.8 mm compared with 11 mm for 
the coil element of a commercial miniature 
pressure sensor (9). 

Miniaturization is enabled by the high MMR 
signal, which results from several aspects (see 
eq. SI in materials and methods): First, the 
high magnetization of NdFeB permanent mag- 
nets (16) leads to an efficient energy trans- 
fer to and from the MMR. Second, the low 
dissipation in the filament minimizes energy 
losses and results in very high quality factors. 
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Fig. 3. Curved needle navigation experiment. (A) For demonstration, the MMR is glued into a recess cut 
into a flexible stylet that is inserted into a curved biopsy needle. The diameter of the MMR housing is 0.8 mm, 
the diameter of the stylet is 1.3 mm, and the outer diameter of the needle is 1.65 mm (16G Birmingham 
gauge). (B) Gelatin phantom simulating a patient. Two “bones” and one “vessel” block direct access to 
the “target lesion.” (C) Projection and oblique view of reconstructed needle path inside the phantom (see 
movie S4). The black dot marks the needle tip derived from the MMR position and orientation. The black 
line points along the needle axis and the gray line marks the axial rotation to enable controlled needle 
rotation. For navigation, a slice of a 3D computed tomography (CT) data set of the phantom has been 
projected (and is therefore deformed) on the xz view. The shaky course of the needle is mainly caused by 
stick-slip movement of the needle in the rather hard Gelatin phantom material. 


Third, the filament bearing allows high angu- 
lar oscillation amplitudes of the suspended 
sphere, leading to large magnetization changes 
that induce high voltages in the detection coils. 
Fourth, the magnetic restoring torque pro- 
vided by the second magnet corresponds to a 
low stiffness or torsion constant that results in 
a rather low resonance frequency when com- 
pared with purely mechanical resonators. When 
the rate of excitation field change is limited— 
as is the case in most practical applications—a 
lower frequency enables higher angular os- 
cillation amplitudes and thus increased sig- 
nal. Our mathematical derivation shows that 
frequencies around a few kilohertz are optimal 
for MMRs, whereas much higher frequencies are 
optimal for LC resonators. Low frequencies have 
the benefit of inducing fewer eddy currents in 
metallic objects or a patient body and thus re- 
duce shielding effects. This is demonstrated in 
the curved needle experiment, where the MMR 
is detected inside a metallic needle. 
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The MMR design also overcomes limitations 
encountered by magnetic micro-electromechanical 
systems (MEMS) for wireless actuation and 
sensing applications (19-21). MEMS processes 
are limited to low remanence magnetic mate- 
rials, and resonators typically use rather stiff 
mechanical elements resulting in high fre- 
quencies but low oscillation amplitudes, lead- 
ing to low signal. Furthermore, dissipation 
in mechanical elements is generally higher than 
in magnetic restoring elements, which limits 
MEMS quality factors and thus frequency res- 
olution for sensing applications. 

The simplicity of MMRs may enable simple 
manufacturing and low cost. The housing can 
be made from glass or plastic; the filament and 
the NdFeB magnets are also inexpensive. The 
MMR assembly does not require high precision, 
as the magnetic forces automatically center 
the oscillating sphere. These benefits com- 
bined with the small size of MMRs and their 
wireless detection distance of ~25 cm (see 


supplemental materials fig. S6) could improve 
or enable a wide range of applications. 

One potential medical application could be 
medication adherence control, where an MMR 
is integrated in a pill whose presence can be 
detected wirelessly inside a patient’s gastro- 
intestinal (GI) tract. Existing approaches either 
require detection patches with body contact (77) 
or centimeter-sized electronic pills (12). MMR 
tracking in the GI tract as simulated by the 
phantom experiment in the supplementary 
materials (fig. S5 and movie S7) could also 
deliver dynamic information on gastric emp- 
tying (10) and bowel motility (22). Here, sev- 
eral MMR markers with different resonance 
frequencies could be operated in parallel to 
collect information from many locations simul- 
taneously; an advantage over magnetic track- 
ing technologies using larger static magnets 
(23, 24). A proof-of-principle experiment on 
simultaneous tracking of 3 MMRs operating 
at different frequencies is presented in fig. S7 
and movie S8. Furthermore, MMR sensing in 
the GI tract could simultaneously deliver body 
core temperature (25), peristaltic pressure, pH 
value, or bowel content viscosity. In surgical 
applications, MMRs could be used for mark- 
ing tumor tissue to guide excision. Current 
nonradioactive solutions require larger mark- 
ers while having a smaller detection distance 
than MMRs (26, 27). For monitoring and tele- 
medicine solutions, tiny MMR sensors could 
be operated directly in the blood stream, e.g., 
for measuring physiological parameters such 
as blood pressure (9) or for functional moni- 
toring of implanted devices, e.g., early detec- 
tion of clogging in vascular stents. 

MMBRs could also be added to medical in- 
struments such as needles, catheters, guide- 
wires, or bronchoscopes to simplify in-body 
navigation (2) without the need for imaging, 
which is costly and often involves harmful 
x-ray radiation. The magnetic detection mech- 
anism avoids line-of-sight problems of camera- 
based tracking systems. In addition, wireless 
MMBRs promise simpler integration in devi- 
ces and better workflow than cable- or fiber- 
dependent solutions (3). Furthermore, position 
and full orientation information (6 DoF) can 
be retrieved from a single MMR. With wireless 
LC resonators (5) or conventional wired elec- 
tromagnetic tracking coils (28), at least two 
markers must be combined to deliver all three 
angles that determine orientation in space. 

MMR technology could also be useful in 
nonmedical applications, e.g., RFID tagging. 
Millimeter-sized MMRs could be integrated 
in products and consumables where current 
RFID tags are either too large or too limited in 
detection distance. For identification, differ- 
ences in resonance frequency, damping con- 
stant, magnetic dipole moment, and various 
nonlinear properties could be used to discern 
millions of MMRs by their signal response. 
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Fig. 4. Pressure sensor design, demonstrator, and measured pressure variation. (A) A compressible housing translates outer pressure variations to distance 
changes between two oscillating spheres. The diffusion-tight metal housing of the demonstrator provides the required stiffness and is coated with silicone rubber 
for a biocompatible surface. The sensor volume is 0.51 mm?. (B) Application of external pressure using a manually operated syringe changes the resonance frequency 
of the MMR. Thereby, an increasing pressure reduces the MMR intersphere distance and increases the frequency. The MMR is placed ~70 mm above the coil 
array. (C) For reference, a commercial pressure sensor provides the applied pressure. 
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Fig. 5. Signal scaling comparison between MMR and LC-type resonators 
for marker and sensor applications. For markers, device signal = determines 
tracking performance, whereas for sensors, performance depends on signal 

= multiplied with a quality function ¢ that reflects frequency resolution. 

(A) The “resonator radius” r corresponds to the radius of the oscillating MMR 
sphere. (B) For LC resonators, r is the radius of a cylindrical antenna coil of 
height 2 r and conductor thickness r/4. (C) Double logarithmic plot of signal 
from MMRs (green lines) and LC resonators (black lines) for different applied 
rates of field change R (different dash styles). A minimal signal requirement 
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for detecting the devices up to reasonable distances of ~ 20 cm in an unshielded 
environment is shaded in light red. The blue area indicates where LC resonators 

can be operated. The green area shows the region only accessible to the MMR 
technology. The circles annotated with “MMR” and “LCQ"” illustrate values of the marker 
demonstrators (supplementary materials) presented in this work, and “LC marker” 
and “LC sensor” represent typical values of optimized devices. (D) Double logarithmic 
plot of weighted device signal X=: ¢. The quality function ¢ is a constant factor for 
MMRs whereas for LC sensors it scales with the squared coil radius. The resulting 
limited frequency resolution prevents shrinking LC sensor size below a few millimeters. 


There are limitations of the MMR tech- 
nology that may impact certain applications. 
A general limitation is the achievable signal 
level versus noise and background signals. Al- 
though our demonstrations were performed 
with millimeter-sized MMRs up to a distance 
of ~25 cm in an unshielded environment, a 
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potential need for even smaller devices, higher 
accuracy, or larger workspaces may require 
advanced background signal subtraction strat- 
egies. The system could also be operated in a 
shielded environment, which would allow mini- 
aturization of the magnetic elements to a few 
micrometers (29). Further limitations are sys- 


tematic errors caused by nearby ferromagnetic 
or metallic objects. Stray fields from ferromag- 
nets can shift the MMR resonance frequency 
and affect sensing accuracy. Eddy currents 
induced in metals can distort the dynamic 
magnetic fields and thus affect localization 
accuracy, an effect shared with wire-bound 
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electromagnetic tracking (2). A limitation for 
tracking of very fast objects, such as a flying 
bee, are position and orientation changes oc- 
curring within the signal acquisition window. 

In conclusion, the presented MMR design 
enables shrinking wireless markers and sen- 
sors to the millimeter range while maintain- 
ing sufficient signal and sensitivity for remote 
detection. Demonstrations of tracking, device 
navigation, and sensing show the potential for 
a platform technology that can cover a wide 
range of applications. 
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CIRCADIAN RHYTHMS 


Rhythmic cilia changes support SCN neuron 
coherence in circadian clock 


Hai-Qing Tu’+, Sen Li't, Yu-Ling Xu'}, Yu-Cheng Zhang"t, Pei-Yao Li’, Li-Yun Liang’, 
Guang-Ping Song’, Xiao-Xiao Jian’, Min Wu’, Zeng-Qing Song’, Ting-Ting Li’, Huai-Bin Hu’, 
Jin-Feng Yuan’, Xiao-Lin Shen’, Jia-Ning Li’, Qiu-Ying Han’, Kai Wang’, Tao Zhang®, Tao Zhou’, 


Ai-Ling Li*?, Xue-Min Zhang??*, Hui-Yan Li??* 


The suprachiasmatic nucleus (SCN) drives circadian clock coherence through intercellular coupling, 
which is resistant to environmental perturbations. We report that primary cilia are required for 
intercellular coupling among SCN neurons to maintain the robustness of the internal clock in 
mice. Cilia in neuromedin S-producing (NMS) neurons exhibit pronounced circadian rhythmicity 

in abundance and length. Genetic ablation of ciliogenesis in NMS neurons enabled a rapid phase 
shift of the internal clock under jet-lag conditions. The circadian rhythms of individual neurons 

in cilia-deficient SCN slices lost their coherence after external perturbations. Rhythmic cilia changes 
drive oscillations of Sonic Hedgehog (Shh) signaling and clock gene expression. Inactivation of 
Shh signaling in NMS neurons phenocopied the effects of cilia ablation. Thus, cilia-Shh signaling in 


the SCN aids intercellular coupling. 


ll mammals have an internal circadian 

clock (~24 hours) that regulates daily 

oscillations in metabolism, physiology, 

and behavior, such as rest-activity and 

sleep-wake cycles (7). The suprachias- 
matic nucleus (SCN) acts as the master circa- 
dian pacemaker (2, 3). Its autonomous and 
coherent oscillatory output signals orchestrate 
the peripheral clocks in multiple tissues through- 
out the body (4, 5). Environmental circadian 
disruptions, such as acute jet lag and long-term 
shift work, cause temporal unsynchronization 
between the internal circadian clock and ex- 
ternal time cues, leading to physiological stress 
(6, 7). Circadian disruption has been implicated 
in tumorigenesis and various psychiatric, neu- 
rological, and metabolic diseases, includ- 
ing depression and diabetes (8, 9). 

The SCN contains a heterogeneous popula- 
tion of ~20,000 neurons, most of which can 
individually generate autonomous circadian 
oscillations (10, 17). These oscillations are driven 
by autoregulatory transcription-translation feed- 
back loops (TTFLs) of clock genes (12, 13). The 
function of SCN as the master pacemaker relies 
on intercellular coupling, a process that syn- 
chronizes period and phase among SCN neu- 
rons (14, 15). Intercellular coupling enables the 
SCN to generate robust and coherent oscilla- 
tions at the population level that are resistant 
to environmental perturbations (J6, 17). Several 
neurotransmitters, including vasoactive intesti- 
nal peptide (VIP), y-aminobutyric acid (GABA), 
and arginine vasopression (AVP), function in 
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maintaining this intercellular coupling (18-22). 
The release of these neurotransmitters is regu- 
lated by clock genes, the transcription of which 
can be further activated by the neurotransmit- 
ters (23). Thus, these neurotransmitters and 
clock genes form a feedforward loop to maintain 
intercellular coupling in the SCN. 

The primary cilium, a sensory organelle 
nucleated by the mother centriole, functions 
in mammalian embryonic development (24). 
Defective ciliogenesis results in a series of 
human disorders collectively known as cilio- 
pathies (25, 26). The primary cilium is also 
present in adult neurons and regulates glyco- 
metabolism (27, 28). We found that in a subset 
of SCN neurons, cilia exhibit pronounced 
circadian rhythmicity in abundance and length. 
We also identified primary cilia as a critical 
device for intercellular coupling to maintain 
the circadian clock in the SCN. 


Results 
Primary cilia in the SCN exhibit circadian 
rhythmic changes 


We examined the distribution of primary cilia 
in the mouse brain and found that many SCN 
neurons contained primary cilia (Fig. 1, A and B, 
and movie S1). Because the SCN is the master 
circadian pacemaker, we tested whether pri- 
mary cilia of SCN neurons exhibited oscillatory 
rhythms during light-dark (LD) and dark-dark 
(DD) cycles. In mice maintained in LD cycles, 
both the number and length of primary cilia 
showed a pronounced circadian rhythmicity, 
peaking at ZT O and reaching a trough at ZT 12 
(Fig. 1, C to E), where ZT represents Zeitgeber 
time used in LD cycles, and ZT 0 and ZT 12 are 
lights on and lights off, respectively. This oscil- 
lation was antiphase to that of the clock gene 
Cry1. Rhythmic oscillations of primary cilia 
in the SCN were also observed in DD cycles, 


(. 


peaking at CT 0 and reaching a trough a) Chee 


12 (Fig. 1F and fig. S1, A and B), where CT LZ 
resents circadian time, and CT 0 and CT 12 are 
subjective dawn and dusk, respectively, indicat- 
ing that these ciliary oscillations are not driven 
by light but rather by internal rhythms. We 
next used a transgenic mouse model express- 
ing ADP ribosylation factor-like guanosine 
triphosphatase 13B (ARL13B)-mCherry fusion 
protein to label the axoneme of primary cilia 
and also observed the diurnal oscillations of 
ciliary abundance in the SCN during DD cycles 
(fig. SIC). 

Primary cilia in other cerebral regions and 
peripheral tissues, including the paraventricular 
nucleus of the hypothalamus (PVN), hippocam- 
pus, kidney, and pancreas, lacked circadian 
rhythmicity (Fig. 1G and fig. S1, D to G). In 
most cells, ciliary abundance and length are 
tightly connected to cell cycle progression (29). 
In vivo bromodeoxyuridine incorporation 
assays showed that almost all of the cells in 
the SCN were postmitotic neurons (fig. SIH), 
indicating that the rhythmicity of cilia is not 
coupled to the cell cycle. Bmall is a core com- 
ponent of the mammalian circadian clock, and 
its deletion expectedly abolished circadian be- 
haviors in mice (30, 31) (fig. SI). The circadian 
oscillation of ciliary abundance was lost in 
Bmall-deficient mice, indicating that the 
rhythmicity of cilia is regulated by clock out- 
put (Fig. 1H). 

To monitor primary cilia in live SCN neurons 
isolated from postnatal mice, we transduced 
them with modified baculovirus encoding the 
mCherry-tagged constitutively ciliary-localized 
protein 5-hydroxytryptamine receptor 6 (5-HT6), 
and performed time-lapse imaging. Consistent 
with the fixed-slice data, primary cilia displayed 
circadian rhythmic oscillation in length: They 
took ~12 hours to shorten to the minimum 
length and regrew to the maximum length 
during the next 12 hours (movie S2; Fig. 1, 
I and J; and fig. S1J). The abundance of 
Cryl1 in ciliated cells was lower than that in 
nonciliated cells (fig. S1, K and L), consistent 
with our in vivo observations. 


Primary cilia confer robustness to the intrinsic 
circadian clock 


The SCN consists of multiple types of neu- 
rons, including AVP-expressing neurons, 
VIP-expressing neurons, and NMS-expressing 
neurons. NMS neurons represent 40% of all 
SCN neurons and encompass most VIP- and 
AVP-expressing neurons (32). To identify the 
cell types of ciliated neurons in the SCN, we 
crossed Nms-Cre or Vip-Cre mice with Rosa- 
stop-tdTomato reporter mice to label NMS or 
VIP neurons. Type III adenylyl cyclase (ACIIT) 
is a prominent marker of primary cilia through- 
out the brain. As revealed by ACIII staining of 
SCN coronal sections, we found that 90% of 
ciliated neurons were NMS neurons and 31% 
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Fig. 1. Primary cilia in the SCN exhibit circadian 
changes in abundance and length. (A) Schematic 
diagram of the SCN. (B) Representative three-dimensional 
reconstructed projection images of the SCN at 20x 
magnification of the two-photon imaging. SCN slices 
were stained with anti-ACIll (primary cilia marker, green). 
Scale bars, 50 um. (C) Representative images of 
primary cilia and expression of clock gene Cry1 in the 
SCN during the LD cycle. SCN coronal sections 

were stained with anti-ACIlIl (green), Cry] RNAscope 
probes (red), and Hoechst (blue). Insets show enlarged 
views of the boxed regions in the SCN. Scale bars, 

100 um (main image) and 20 um (magnified region). 
(D) Percentage of cells with primary cilia or Cryl 
RNAscope signals in the SCN determined based on (C). 
(E) Quantitative analysis of the cilium length in (C). 

(F) Percentage of cells with primary cilia or Perl 
RNAscope signals in the SCN quantified during the DD 
cycle. (G) Percentage of cells with primary cilia or 

Cryl RNAscope signals in the PVN determined during 
the DD cycle. (H) Percentage of cells with primary 

cilia in the SCN for wild-type and Bmall~““ mice during 
the DD cycle. (I) Representative time-lapse images 

of the primary cilium in SCN neurons for 48 hours. 
Isolated live SCN neurons from postnatal mice were 
transduced with modified baculovirus encoding mCherry- 
tagged 5-HT6. The numbers on the images indicate 

the time. Arrows indicate primary cilia. Scale bar, 5 wm. 
(J) Quantitative analysis of the cilium length in (1). 
Each line represents the oscillation of cilium length 

in an individual neuron. All data are presented as 

mean + SEM. Statistics indicate significance by 
one-way ANOVA with Tukey’s correction [(D) to 

(G)] or two-way ANOVA with Bonferroni correction 

(H). n = 3 mice per time point. ***P < 0.001; 

ns, not significant. 


were VIP neurons (Fig. 2, A and B, and fig. 
S2A). We performed double immunostain- 
ing using antibodies to ACIII and AVP and 
found that only 5% of ciliated cells were AVP 
neurons (Fig. 2, A and B). Thus, primary cilia 
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are mainly present on NMS neurons in the 
SCN (Fig. 2C). 

We disrupted primary cilia specifically in 
NMS neurons by conditionally deleting J/t88 or 
Tft20, two genes required for ciliogenesis (33-35). 


Primary cilia on SCN neurons were abolished 
in Noms-Ift8s/ ~ or Nms-Ift20/ ~ mice (fig. S2, B 
to E). By contrast, the numbers of primary cilia 
in other tissues were comparable between 
control and Nms-Ift88'~ or Nms-Ift20~ mice 
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Fig. 2. Primary cilia in NMS neurons confer robustness to the intrinsic 
circadian clock. (A) Representative images of primary cilia in multiple 

types of SCN neurons. For NMS and VIP neurons, SCN coronal sections from 
Nms-tdTomato or Vip-tdTomato mice were stained with antibody to ACIII (green) 
and Hoechst (blue). For AVP neurons, SCN coronal sections were stained with 
anti-ACIII (green), anti-AVP (red), and Hoechst (blue). Insets show enlarged 
views of the boxed regions in the SCN. Scale bars, 100 wm (main image) and 
10 um (magnified region). (B) Quantitative analysis of the percentage of ciliated 
cells with the indicated neuropeptide in (A) (n = 3). (C) Distribution diagram of 
primary cilia in NMS, AVP, and VIP neurons. (D) Representative double-plotted 
actogram of mice under experimental jet-lag conditions. Activity records are 
double plotted so that 48 hours are represented horizontally. Each 24-hour 
interval is presented both to the right of and beneath the preceding day. Mice 
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were maintained in LD cycles for 14 days, then the cycle was advanced 

8 hours, and 15 days later, the cycle was returned to the original lighting regime. 
(E) Line graphs showing the daily phase shift of wheel-running activities after 
an 8-hour advance in (D) (n = 10). (F) PSso values after an 8-hour advance in 
(D) (n = 10). (G) Representative double-plotted actograms of mice subjected 
to an 8-hour phase advance on day 1 and released to DD. (H) Line graphs 
showing the daily phase shift of wheel-running activities in (G) (n = 10). For 
this and subsequent figures, wheeling-running activity is indicated by black 
markings. White and pink backgrounds indicate lights on and lights off, 
respectively. The red lines on the actograms indicate the phase of activity 
onset or offset. All data are presented as mean + SEM. Statistics indicate 
significance by one-way ANOVA with Dunnett correction (F). ***P < 0.001; 
ns, not significant. 
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Fig. 3. Primary cilia promote interneuronal coupling in the SCN. 

(A) Representative records of single-cell Per2 bioluminescence in SCN slices 
under three conditions: pretreatment, TTX treatment, and after washout of TTX 
(n = 50 cells). Arrows indicate TTX washout by medium changes. (B) Left, 
representative heatmap of Per2 bioluminescence oscillation in (A). Red and 
green represent high and low bioluminescence intensity, respectively. Right, 
Rayleigh plots showing the phase distribution of single cells during the third day 
(midpoint) in (A). Arrows represent mean circular phase, and the length 

of the arrow represents the strength of synchronization. (C) Homogeneity 

of the single-cell phase from multiple replicate SCN slices evaluated with length 


(fig. S2, F and G). Deletion of [/t88 or Ift20 
in NMS neurons did not affect mouse de- 
velopment or the morphology of the SCN 
(fig. S3). 

We also investigated whether primary cilia 
contribute to the pacemaker function of the 
SCN by monitoring the locomotor activity of 
Nms-Ift88' ~ or Nms-Ift20' ~ mice, referred to 
hereafter as SCN“"*™"" mice. Under LD cycles, 
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both control (Nms-Cre, [ft88"!" and 1ft20"") 
and SCN“#""" mice exhibited normal locomotor 
activity with a wheel-running period of 24 hours 
(fig. S4A). Under DD cycles, control mice ex- 
hibited intrinsic periods of 23.6 + 0.1 hours (fig. 
S4, A and B). SCN“2""" mice had a moderately 
elongated intrinsic period: 24.0 + 0.1 hours 
for Nms-Ifiss' ~ mice and 23.9 + 0.1 hours for 
Nms-Ift20/ ~ mice (fig. S4, A and B). 


Bioluminesence (A.U.) 


Nms-Ift887- 
35 7 Per2::Luc ¥ Wash 


8 10 12 14 16 
Time (days) 


ONms-Cre 0 ift88 O Nms-lft8s8-- 
ns RK 


E 1 36.0°C : 38.5°C cycles 
C1 38.5°C : 36.0°C cycles 


Peak time of day (h) 


Nms-Cre_ /ft88™ Nms-lft8s 


of the Rayleigh plot vector in (B) (n = 3). (D) Representative records of Per2 
bioluminescence rhythms in SCN slices. SCN slices were exposed to 

12 hours of 36°C and 12 hours of 38.5°C temperature cycles or oppositely 
phased temperature cycles for 3 days, and their bioluminescence was then 
monitored continuously at a constant 36°C. Bioluminescence was normalized 
to the first peak. (E) Quantitative analysis of the peak time of Per2 
bioluminescence after the temperature cycles in (D) (n = 8). Data are 
presented as mean + SEM. Statistics indicate significance by Watson-Wheeler 
test (B) or two-way ANOVA with Bonferroni correction [(C) and (E)]. 

***P < Q.001; ns, not significant. 


We further examined the behavior of SCN@?"™ 
mice under experimental jet-lag conditions. 
Mice were maintained in normal LD cycles for 
14 days, the cycle was advanced by 8 hours, 
and 15 days later, the cycle was returned to the 
original lighting regime. Under LD cycles, 
both control and SCN“*™™" mice successfully 
entrained to the LD schedule. When the lighting 
cycle was advanced, control mice re-entrained 
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Fig. 4. Hedgehog signaling maintains circadian rhythm in the SCN. 

(A) Representative double-plotted actogram of Nms-Cre, Smo™", and Nms-Smo’” 
mice under experimental jet-lag conditions. (B) Line graphs showing the 

daily phase shift of wheel-running activities after an 8-hour advance in (A) 

(n = 10). (C) Representative double-plotted actograms of mice treated with 

vehicle (left) or 5 mM vismodegib (right) under experimental jet-lag conditions. 
Vismodegib was applied to the SCN by osmotic minipump. Asterisks indicate time of 
surgery. Three days after surgery, LD cycles were advanced by 8 hours. (D) Line 
graphs showing the daily phase shift of wheel-running activities after an 8-hour 
advance in (C) (n = 11 for the vehicle group, n = 12 for the vismodegib group). 
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(E) Representative double-plotted actogram of Shh™"AAV and Shh""AAV-Cre mice 
under experimental jet-lag conditions. (F) Representative double-plotted actogram 
of Ptchl"-AAV and Ptchl"".AAV-Nms-Cre mice under experimental jet-lag 
conditions. (G) Heatmap of Per2 bioluminescence oscillation under three conditions: 
pretreatment, vismodegib treatment, and after washout of vismodegib (n = 50 
cells). (H) Rayleigh plots of phase distribution of single cells in (G) during the third 
day. (I) Homogeneity of the single-cell phase from three replicate SCN slices was 
evaluated with length of the Rayleigh plot vector in (G). All data are presented 

as mean + SEM. Statistics indicate significance by Watson-Wheeler test (H) 

or one-way ANOVA with Bonferroni correction (I). ***P < 0.001. 
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Fig. 5. Ciliary Hedgehog signaling regulates the rhythms of clock genes 
and neuropeptides. (A and B) Quantitative real-time polymerase chain reaction 
(PCR) analysis of core clock genes (A) and neuropeptides (B) in the SCN. The 
SCN was collected at 4-hour intervals across the LD cycle. (C) Representative 
immunostaining images of Vip and Grp in the dosal and ventral SCN at ZT 20. 
SCN coronal sections were stained with anti-Vip (green), anti-Grp (red), and 
Hoechst (blue). Insets show enlarged views of the boxed regions. Scale bars, 
100 um (main image) and 10 um (magnified region). (D) Quantification of the 
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relative intensity of Vip and Grp in (C). (E) Summary diagram showing primary 
cilia in SCN neurons and downstream Hedgehog signaling as critical regulatory 
mechanisms promoting interneuronal coupling, thereby maintaining SCN network 
synchrony and circadian rhythms. All data are presented as mean + SEM. 
Statistics indicate significance by one-way ANOVA (D) or two-way ANOVA 

[(A) and (B)] with Bonferroni correction. n = 3 independent experiments for 
Nms-Ift88°’~ versus [ft88'" and Nms-Smo~’~ versus Smo"". *P < 0.05, 

*eP <Q. 8 8P < 0.001 
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progressively over 9 to 11 days (Fig. 2, D and E). 
By contrast, SCN“"*""" mice re-entrained more 
quickly, and the entrainment was complete 
within 1 to 3 days (Fig. 2, D and E). We then 
calculated the time at which half the phase 
shift was completed (PS;09). The PSs9 value of 
control mice was about 5.4 days, whereas 
that of Nms-Ift88/~ and Nms-Ift20- mice was 
0.7 + 0.2 and 0.9 + 0.2 days, respectively (Fig. 
2F). When the cycle was returned to the orig- 
inal lighting regime, control mice re-entrained 
progressively over several days (Fig. 2D and fig. S4, 
C and D), whereas SCN“2™"" mice again re- 
entrained immediately to the LD cycle. These 
data indicate that primary cilia in NMS neurons 
influence the entrainment of circadian rhythms. 

Vip-Ift88’~ or Vip-Ift20’’~ mice also re- 
entrained more quickly than did control mice 
(4 to 6 days versus 9 to 11 days) (fig. S5, A to E). 
However, these mice took more time to be re- 
entrained than did Nms-Ijt88” or Nms-Ift20”— 
mice (1 to 3 days), presumably because some 
ciliated neurons remained in Vip-Ift88 ~ 
or Vip-[ft20’~ mice (fig. 85, F and G). Aup-Ifi88”- 
or Avp-Ift20’~ mice behaved normally under 
experimental jet-lag conditions, as primary 
cilia were not affected in these mice (fig. S6). 
These data further support a specific role 
of cilia in NMS neurons in circadian rhythm 
entrainment. 

There were no differences between control 
and SCN“2 mice in the number of c-Fos* 
cells induced by a 30-min light pulse at CT 22 
(fig. S7), indicating that the light response of 
SCN“22U! mice was similar to that of control 
mice. These results demonstrate that SCNT" 
mice are more adaptive to phase shifts during 
LD cycles, suggesting that primary cilia in- 
fluence resistance to environmental time cues. 

To exclude the effect of light on activity of 
mice, we subjected mice to constant darkness 
(DD) after an 8-hour phase advance in an LD 
protocol. Under DD, SCN“*™" mice exhibited 
significant phase advances in the free-running 
behavior, whereas control mice did not (Fig. 2, 
G and H). This finding suggests that the 
immediate adaptation to a new LD cycle in 
SCN“@null mice is a rapid phase shift of the 
internal clock, not a masking of the environ- 
mental LD cycle. 


Primary cilia promote interneuronal coupling 
in the SCN 


SCN neurons rely on intercellular coupling to 
maintain intrinsic circadian behavior. Intercel- 
lular coupling confers robustness to neuronal 
networks and synchronizes periods of individ- 
ual cellular oscillators. Per2::Luciferase (Per2:: 
Luc) transgenic reporter mice can be used to 
track Per2 rhythmic expression in single cells 
ex vivo. To test whether primary cilia are re- 
quired for intercellular coupling among SCN 
neurons, we used real-time luciferase lumi- 
nescence imaging of SCN slices isolated from 
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control (Nms-Cre and Iftss™") or Nms-Ift88'~ 
mice that expressed the Per2::Luc reporter. 
Circadian rhythmicity of the analyzed SCN 
cell population was coherent in both control 
and Nms-Ift88'~ slices under normal culture 
conditions (Fig. 3, A to C). We then applied 
tetrodotoxin (TTX), a sodium ion channel 
blocker, to disrupt the intercellular coupling of 
SCN neurons. TTX disrupted the phase order 
of both control and Nms-Ift88/~ SCN neurons 
(Fig. 3, A to C). After the washout of TTX, con- 
trol neurons recovered their coherent phase 
order, whereas Nms-/ft88’/~ neurons failed to 
do so (Fig. 3, A to C; fig. S8; and movies S3 to 
85). Thus, primary cilia in NMS neurons ap- 
pear to promote intercellular coupling. 
Intercellular coupling in the SCN is required 
for resistance to physiological temperature 
changes (17). We therefore tested whether pri- 
mary cilia in the SCN contributed to the re- 
sistance to cyclic temperature entrainment. 
Control and Nms-Ift88'~ SCN slices were ex- 
posed to 12 hours of 36°C and 12 hours of 38.5°C 
temperature cycles (normal body temperature 
cycles) or oppositely phased temperature cycles 
for 3 days. Both control and Nms-Ift88/~ SCN 
slices maintained their phases of Per2 biolu- 
minescence after normal cyclic temperature 
entrainment. After oppositely phased temper- 
ature cycles, the phase of control SCN slices 
remained unchanged, but the phase of Nms- 
Tft8s"'~ SCN slices was obviously shifted (Fig. 
3, D and E). Thus, the cilia-null SCN was less 
resistant to cyclic temperature changes. This 
resistance likely stems from cilia-dependent 
intercellular coupling in the SCN. 


Hedgehog signaling maintains circadian 
rhythm in the SCN 


To investigate how primary cilia in NMS neu- 
rons mediate intercellular coupling, we exam- 
ined the functions of Avp and Vip receptors, two 
well-known heterotrimeric G-protein coupled 
receptors (GPCRs) in the SCN (19, 20). Avp and 
Vip receptors did not localize to cilia, and Vip 
receptor function remained normal in Nms- 
Ift88"'~ mice (fig. S9). The primary cilium is a 
critical organelle that regulates Sonic Hedgehog 
(Shh) signaling (36-38). In a genome-wide 
screen, inhibition of the Hedgehog pathway 
resulted in long period length of circadian os- 
cillations in U2OS cells (39). Shh genes were 
broadly expressed in SCN neurons (fig. S10, A 
and B). The addition of Shh could induce the 
expression of the downstream gene Giil in 
NMS neurons, and this effect was abolished 
in Nms-Ift887’ ~ mice (fig. S10, C and D). The 
expression of Glil and Ptchl1, two Shh signaling 
target genes, exhibited rhythmic oscillation in 
the SCN under both LD and DD conditions 
(fig. SIOE). This rhythmic oscillation was lost 
in Nms-Ift88/~ mice or Bmall-deficient mice 
(fig. S10, F and G), so Shh signaling may func- 
tion in the regulation of the central clock. 


The enrichment of smoothened (SMO) on 
cilia is required to initiate Hedgehog signaling 
(40). We generated Nms-Smo7’~ mice to block 
Shh signaling in the SCN (fig. SIA). Although 
Nms-Smo'~ mice did not exhibit overt devel- 
opmental defects (fig. S11, B to E), we could not 
completely rule out subtle off-target devel- 
opmental effects. Under experimental jet-lag 
conditions, Nms-Smo~'~ mice re-entrained im- 
mediately to the LD cycle (Fig. 4, A and B, and 
fig. S11, F and G). We next applied an SMO 
inhibitor, vismodegib, to the SCN through an 
osmotic minipump in live animals during ex- 
perimental jet-lag conditions. Vismodegib 
caused immediate re-entrainment to the LD 
cycle (Fig. 4, C and D, and fig. S12A). Moreover, 
vismodegib elicited a reversible dose-dependent 
suppression of Per2::Luc oscillations in SCN 
slices (fig. S12, B to D), and this inhibitory ef- 
fect was abolished in Nms-Smo”’~ SCN slices 
(fig. S12, E and F). We also found that the SMO 
agonist SAG induced Per2 expression and 
significantly delayed the phase of the SCN 
circadian oscillation by up to 4 hours (fig. $12, 
G and H). These effects were abolished in 
Nms-Smo’~ SCN slices. These results dem- 
onstrate that Shh signaling is required for 
the resistance of the internal clock to environ- 
mental time cues. 

We generated Shh conditional knockout mice 
by bilateral injection of adeno-associated virus 


(AAV)-expressing Cre recombinase to the SCN *‘ 


of Shh'’" mice (fig. S13, A and B). Similar to 
Nms-Smo!~ mice, mice lacking Shh in the 
SCN re-entrained immediately to the LD cycle 
under experimental jet-lag conditions (Fig. 4E 
and fig. S13, C and D). To test the effect of en- 
hanced Shh signaling in the SCN, we deleted 
Ptchl, a key negative regulator of Shh signaling, 
in NMS neurons in mice (47) (fig. SIZE). Nms- 
Ptch1’'~ mice re-entrained immediately to the 
LD cycle under experimental jet-lag condi- 
tions (Fig. 4F and fig. S13, F and G). Thus, 
both rhythmic oscillation of Shh signaling 
and its amplitude influence coupling in the 
central clock. 

To test whether Shh signaling also functions 
in interneuronal coupling in the SCN, we an- 
alyzed circadian rhythms in SCN slices using 
the single-cell real-time luciferase luminescence 
imaging assay. Similar to Nms-Ift88’" neurons, 
Nms-Smo7’~ neurons failed to recover their 
phase order after the washout of TTX (fig. S14). 
The Smo inhibitor vismodegib also reversibly 
disrupted the phase order of Per2::Luc in SCN 
slices (Fig. 4, G to I, and movie S6). Thus, Shh 
signaling may promote intercellular coupling 
among SCN neurons. 


Ciliary Hedgehog signaling regulates the 
rhythms of clock genes and neuropeptides 


Because the circadian clock controls the tran- 
scription of multiple genes that are important 
for interneuronal communications, we examined 
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the expression of core clock genes. The rhyth- 
micity of core clock genes, including Perl, CryI, 
Bmall, and Clock, was altered in Nms-Ift88 /~ 
and Nms-Smo~'~ mice (Fig. 5A). We then in- 
vesgated the rhythmic expression of neuropep- 
tides that are known to mediate intercellular 
coupling. The rhythmicity of several neuro- 
peptide genes, including Vzp, gastrin-releasing 
peptide (Grp), Avp, and prokineticin 2 (Prok2), 
was altered in Nms-Ift8s/ ~ and Nms-Smo“~ 
mice (Fig. 5B). As shown by our immunofluo- 
rescence assay, the protein concentrations of 
Vip and Grp were significantly decreased in 
Nms-Ift88’- and Nms-Smo’ SCN (Fig. 5, C 
and D). Thus, cilia depletion in SCN neurons 
leads to dampened oscillations of core clock 
genes and neuropeptides. 


Discussion 


The SCN drives coherent and synchronized 
circadian oscillations that are resistant to en- 
vironmental perturbations, and this resistance 
relies on interneuronal coupling. Our findings 
establish primary cilia in NMS neurons and 
downstream Shh signaling as critical regula- 
tory mechanisms that promote interneuronal 
coupling, thereby maintaining SCN network 
synchrony and circadian rhythms (Fig. 5E). 
The rhythmic cilia changes in the SCN lead 
to rhythmic oscillation of Shh signaling, which 
in turn drives rhythmic expression of core 
clock genes and neuropeptides. This feed- 
forward loop sustains robust and coherent 
oscillation at the cell population level, making 
the intrinsic clock resistant to environment 
perturbations. 

Primary cilia are hubs for ACIII-mediated 
ciliary cyclic adenosine 3’,5'-monophosphate 
(cAMP) production. Cytoplasmic cAMP signal- 
ing is implicated in the SCN pacemaking 
function (42). Although the ratio between the 
volumes of whole cell and cilia is 5000:1, we 
speculate that ciliary cAMP could still influ- 
ence the behavior of the entire cell through 
signaling amplication during SCN clock reg- 
ulation. It will be interesting to investigate 
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whether ciliary cAMP could also be involved 
in cytoplasmic cAMP signaling during the 
SCN pacemaking function. 

Epidemiological studies have linked frequent 
cross-time-zone travel and shift work to high 
blood pressure, obesity, and other metabolic 
disorders. Our results show that pharmaco- 
logical blockade of the Shh pathway accel- 
erates recovery from experimental jet lag in 
mice. Targeting Shh signaling might be a 
potential therapeutic strategy for the treat- 
ment of human diseases related to circa- 
dian disruptions. 
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Technical Comment on “Policy impacts of statistical 


uncertainty and privacy” 


Yifan Cui’, Ruobin Gong2*, Jan Hannig?, Kentaro Hoffman* 


Steed et al. (1) illustrates the crucial impact that the quality of official statistical data products may 
exert on the accuracy, stability, and equity of policy decisions on which they are based. The authors 
remind us that data, however responsibly curated, can be fallible. With this comment, we underscore 
the importance of conducting principled quality assessment of official statistical data products. We 
observe that the quality assessment procedure employed by Steed et al. needs improvement, due to 
(i) the inadmissibility of the estimator used, and (ii) the inconsistent probability model it induces on the 
joint space of the estimator and the observed data. We discuss the design of alternative statistical 
methods to conduct principled quality assessments for official statistical data products, showcasing 
two simulation-based methods for admissible minimax shrinkage estimation via multilevel empirical 
Bayesian modeling. For policymakers and stakeholders to accurately gauge the context-specific usability 
of data, the assessment should take into account both uncertainty sources inherent to the data and 
the downstream use cases, such as policy decisions based on those data products. 


e€ motivate the proposed assessment 
framework by considering Title I fund- 
ing allocation by the U.S. Department 
of Education using the U.S. Census 
Bureau’s Small Area Income and Pov- 
erty Estimates (SAIPE) dataset studied by 
Steed e¢ al. (1). Let u = (uy, ..-, Uz,) be the true 
population counts for children under poverty 
in districts7 = 1,...,k, and nv = (a, ...,@,) be 
the official SAIPE poverty estimates. Denote 
by y: N¥( Rt) the entitlement function, 
that is, y(@) = (y:(&),..., yx(a)) are the dis- 
tricts’ official entitlements (in USD) based on 
aw, and y(u) the true entitlements were the 
true poverty population » known. Finally, let 
L(-;-) be a loss function that measures the 
misallocation of funding between y(a#) and 
y(u). The assessment estimates the aver- 
age loss between the ideal and the realized 
allocations: 


B(L(y(u); y(@))|a) (1) 


with expectation taken over what we denote 
as p-(u|a), the available distributional infor- 
mation about the true poverty counts given 
the observed estimate x and any auxiliary pa- 
rameter c. The parameter c may encode known 
information about the variability in the ob- 
served estimates, such as their sampling or 
model-based variance. When p,(u|”) is given, 
Eq. 1 can be approximated via simulation: 


ze(y (x y(e)) (2) 
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where 1) ~ p,(\a) iid. for some T large. This 
assessment is uncertainty- and policy-aware by 
the specifications of p,(u|x) and L, respectively. 

Typically, the loss function £ is chosen by 
the assessor depending on the policy context, 
whereas p,(u|x) relies on information availa- 
ble to the assessor. Following Steed et al. (1), 
the available information are (i) the coefficients 
of variation upper bounds c = (¢},..., Cz) SUg- 
gested by the Census Bureau; and (ii) that x is 
approximately normally distributed around u. 
That is, 


x\p ~ N(u, diag(v)) (3) 


where v = (Vj, ..-, Ux), Ui = (c4a;)? are the sam- 
pling variances of «x. 

Steed et al. employ a simulation procedure 
[section 2 of the supplementary materials in 
(1)] that approximates Eq. 1 by using w as a 
plug-in estimate for u, and producing repli- 
cates of x using Eq. 3 based on this plug-in 
estimate. Understood within our proposal, 
this procedure amounts to simulating pw” 
replicates (7 = 1000) under the following 
choice of pe(ulz): 


ula~N (a, diag(v)) (4) 


This is not ideal for two reasons. First, each 
u generated through (Eq. 4) is inadmissible 
for the true poverty count u, a classic obser- 
vation from Charles Stein (2, 3). Second, Eq. 3 
and Eq. 4 together do not admit a consistent 
joint probability distribution for (u,v) (#, 
exposing the procedure to potential paradox- 
ical conclusions [e.g., (5)]. 

How should the assessor construct p-(u|”)? 
There is unlikely a unique “best” approach for 
all contexts, but reasonable starting points 
exist. Here, we discuss a class of distributions 
derived via multi-level empirical Bayesian mod- 


eling that accord to admissible and mini 
shrinkage estimates for u. The class foll-— 
the general form 


pla” (I — B;)a; + BiB, (1 — B;)0;), 
i=1,..,k (5) 


where f and all B; € [0,1] are functions of x and 
the auxiliary c. The method is called shrinkage 
because compared to Eq. 4, it adjusts each 
poverty estimate u; based on the observed 2; 
to account for a common baseline B with a 
100B;% variance reduction. This restores con- 
sistency on the joint specification of (1,2) 
whenever B; = 0. 

Two possible constructions of Eq. 5 are (i) 
The Hudson-Berger (HB) construction (6, 7), 
for which 6 = O and 


( — 2)/vi 


Ss _/as/e9) 


and (ii) the Morris-Lysy (ML) construction 
(8), for which B = 2, 


Be’ = min 1, 


(6) 


Oj 
Oj + Gp(1 — B)/B 


(7) 


BS 


ke 
where 0, = k/ ; ti is the harmonic 
mean of the v;’s, B 7 (k — 3)/(k — 4)6’, and 
6° = (k-1)" S 


square error in the observed poverty counts. 

Both constructions cater to unequal sam- 
pling variances v;. They differ in that Hudson- 
Berger exerts stronger shrinkage for larger v; 
whereas Morris-Lysy for smaller v;. Due to 
the heavy tail of the SAIPE poverty estimates 
& and increasing c; for larger x;, we apply the 
Morris-Lysy method on the observed poverty 
proportion 2; /n; (rather than 2;), where 7; is 
the total population of district 7 in order to 
mitigate overly strong shrinkage effects. 

We compare the proposed approaches with 
the evaluation of Steed et al. (1). The top panel 
of Fig. 1 compares the quantiles of the ex- 
pected Hudson-Berger and Morris-Lysy pov- 
erty estimates with the SAIPE estimates. The 
bottom panel displays poverty estimate repli- 
cates generated through the constructions of 
Hudson-Berger, Morris-Lysy, and Eq. 2 for four 
districts with different population sizes (at 1, 
5, 50, and 100% quantiles). For small counts, 
Hudson-Berger shrinks strongly resulting in 
nearly constant u) replicates, whereas its repli- 
cates are comparable to that of Steed et al. (7) for 
larger counts. On the other hand, the Morris- 
Lysy method exhibits a varying and moderate 
shrinkage effect at all count levels. 

Table 1 displays the estimated lost entitle- 
ment based on the three approaches, with 
data error alone and with differential privacy 
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Fig. 1. (Top) quantile-quantile comparisons of expected Hudson-Berger (left) and Morris-Lysy (right) poverty estimates with SAIPE estimates (logj9). (Bottom) 
Boxplots of 10* poverty estimate replicates from the Hudson-Berger, Morris-Lysy, and Steed et al. (1) constructions for four districts with total population sizes at 1, 5, 


50, and 100% quantiles. 


Table 1. Estimated lost entitlements (in USD, billions) due to data error (left) and due to data and 
privacy error (middle; « = 0.1) according to each assessment construction. Additional loss due to 


privacy (percent) is shown on the right. 


data error (s.e.) 


Steed et af. (1) oo. 1.058 (0.03) oo. 
Hudson-Berger a. 1.060 (0.032) 
Morris-Lysy 2.385 (0.044) 


protection (« = 0.1) applied to the observed 
SAIPE estimates first. These results are repro- 
duced and/or implemented using the code 
provided by Steed et al. (1). The Hudson-Berger 
assessment agrees closely with the assessment 
by Steed et al. (1), putting the expected lost 
entitlement at $1.06 billion due to data error 
and an additional 4.65% due to privacy pro- 
tection. The Morris-Lysy assessment estimates 
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data + privacy error (s.e.) diff. (%) 

eer. 1109 (0.031) nnunnnnnnnennt 22 

cmon CSO) pele eee, 4.650 
2.429 (0.044) 1.840 


the lost entitlement at $2.385 billion due to 
data error, and an additional 1.84% due to 
privacy protection. The code we used to con- 
duct these experiments relies in part on the 
public codebase that accompanies Steed et al. 
(7) and can be found at https://github.com/ 
khoffm4/dp-policy-shrink. 

The analysis by Steed et al. (1) is a timely 
companion to the rapid emergence of differ- 


ential privacy as the new formal privacy stan- 
dard for statistical disclosure limitation (SDL), 
anticipating possible adoption in complex sur- 
vey programs at the Census Bureau (9, 10) and 
at the IRS (1D). The privacy revamp has been 
met with critical feedback from data users (12), 
who question the usability of differentially 
private data products after deliberate noise 
injection which instills distrust both in the 
data product and in the competence of the 
curator. The privacy innovation inadvertently 
ruptured, in the words of (73), a “statistical 
imaginary’ that official statistics are somehow 
pristine. Steed et al. (J) point out that data 
users’ distrust may be misplaced, as the impact 
of errors and uncertainty stemming from sam- 
pling, response, measurement, reporting, and 
editing may dominate that of errors from 
privacy. It exposes the need to examine, ac- 
curately and often, the extent to which every 
error source in an Official statistical product 
affects policy decisions. The development of 
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quality assessment tools that are theoretically 
sound, substantively relevant, and practically 
deployable calls for quantitative research. 
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statistical uncertainty and privacy” 
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We offer our thanks to the authors for their thoughtful comments. Cui, Gong, Hannig, and Hoffman 
propose a valuable improvement to our method of estimating lost entitlements due to data error. 
Because we don’t have access to the unknown, “true” number of children in poverty, our paper 
simulates data error by drawing counterfactual estimates from a normal distribution around the official, 
published poverty estimates, which we use to calculate lost entitlements relative to the official 
allocation of funds. But, if we make the more realistic assumption that the published estimates are 
themselves normally distributed around the “true” number of children in poverty, Cui et al.’s proposed 
framework allows us to reliably estimate lost entitlements relative to the unknown, ideal allocation of 
funds—what districts would have received if we knew the “true” number of children in poverty. 


ui et al. show that when we measure 
losses relative to the ideal entitlements 
(rather than relative to fixed published 
estimates) with this assumption, the 
impacts of data error could be even 
larger. Using one possible approach under 
this framework (the Hudson-Berger construc- 
tion), Cui et al.’s estimates of lost entitlements 
due to data error and privacy noise are very 
close to ours. Under another approach (Morris- 
Lysy), their estimate of lost entitlements due 
to data error nearly doubles, exceeding losses 
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due to privacy noise by even more than we 
estimated. 

We recommend Cui et al.’s framework to 
future studies of the effect of data quality on 
policy outcomes, and we look forward to fu- 
ture research in this area—in particular, guid- 
ance on which shrinkage constructions are 
most appropriate for a given evidence-based 
policy setting. 
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WORKING LIFE 


By Greta Faccio 


982 


One job wasn't enough 


sat at my desk wondering whether I would ever feel as engaged in and proud of my work as I 
had in academia. My job in the R&D division of a cosmetics company was coming easy to me and 
resulting in products on the shelves. But I thought constantly about what else I could do. I missed 
the feeling of risk and adventure that being a scientist at the edge of a discovery gives. I knew I 
didn’t want to go back to academia, where I would have to hyperspecialize and study the same 
thing every day. But I couldn’t picture myself in that industry job for years and years. It was time 
to get creative and find a different solution—or, as it turned out, a combination of them. 


I had made the move to the private 
sector after 5 years as a postdoc, 
when I became disenchanted with 
the instability and lack of fund- 
ing that is inherent in academia. I 
started my job search by reaching 
out to Ph.D.s in industry—people 
I’d worked with or found on social 
media. I asked about their career 
choices and what they liked and 
disliked about their jobs. Many 
told me they enjoyed the security 
that came with their long-term 
contracts, as well as their clearly 
defined job responsibilities. That 
sounded appealing. 

Through one of those contacts, I 
landed a job identifying scientific ev- 
idence behind cosmetic ingredients 
and researching new technologies. 
The stability and great workplace 
atmosphere lifted my spirits. But the 
work was mostly literature searches 
and summaries, and after a while it didn’t excite me anymore. 

I thought back to what I loved about academia. I had al- 
ways enjoyed that my days were varied, cycling through a 
range of activities: planning experiments, attending meet- 
ings, problem solving, lab work, discussing results, and other 
tasks. Research kept me on my toes and challenged me to de- 
vise creative solutions that could be communicated to a wider 
audience. I wasn’t getting that with my position in industry. 

I looked for other options in the private sector. But no 
one job offered the variety and chance to explore that I was 
missing. That’s when I asked: Did I really have to choose just 
one? It might be time to experiment. So instead of pursuing a 
linear and conventional path, I decided to combine multiple 
jobs—all focusing on my passion for innovation, all with room 
for personal growth. 

Part-time positions that offered what I was looking for 
were hard to find. But after casting a broad net and reach- 
ing out to companies that I thought might need support in 


“This work situation has 
given me the vibrant and varied 
workdays | was after.” 


their research and scientific com- 
munication, I met employers who 
were open to unconventional solu- 
tions. Through calls and lunches 
together, we identified interesting 
tasks that were too small to justify 
a full-time position. 

I slowly built up my portfolio, 
eventually filling out my schedule 
with three part-time positions. I 
started by joining a global company 
as intellectual property manager, 
overseeing its patent and trade- 
mark portfolio—work that only 
requires a few mornings a week. 
Later, I added another role one to 
two mornings a week working as a 
patent scientist in a law firm that 
specializes in intellectual property. 
When I’m not doing those jobs, I’m 
an independent consultant for food 
and cosmetic startups, helping 
scout technology and communicate 
the scientific data behind their products. 

This work situation has given me the vibrant and varied 
workdays I was after. My schedule is constantly changing 
and the series of projects I’m tasked to work on are always 
new, affording me opportunities to learn and grow. I may 
not be making new scientific discoveries. But I get to work 
on challenging and sometimes uncertain projects—especially 
when preparing patents—which gives me the rush of excite- 
ment I was looking for. The flexible schedule also gives me 
time to be a mum in the afternoons and the freedom to work 
remotely from a variety of locations, including from where 
my parents live in Italy. 

It was risky for me to try to piece together a career from 
multiple part-time positions. But it has turned out to be more 
practical than I imagined—and as rewarding as I hoped. 


Greta Faccio works in intellectual property and is based in St. Gallen, 
Switzerland. Send your career story to SciCareerEditor@aaas.org. 
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